1
|
Artificial intelligence and database for NGS-based diagnosis in rare disease. Front Genet 2024; 14:1258083. [PMID: 38371307 PMCID: PMC10870236 DOI: 10.3389/fgene.2023.1258083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 11/24/2023] [Indexed: 02/20/2024] Open
Abstract
Rare diseases (RDs) are rare complex genetic diseases affecting a conservative estimate of 300 million people worldwide. Recent Next-Generation Sequencing (NGS) studies are unraveling the underlying genetic heterogeneity of this group of diseases. NGS-based methods used in RDs studies have improved the diagnosis and management of RDs. Concomitantly, a suite of bioinformatics tools has been developed to sort through big data generated by NGS to understand RDs better. However, there are concerns regarding the lack of consistency among different methods, primarily linked to factors such as the lack of uniformity in input and output formats, the absence of a standardized measure for predictive accuracy, and the regularity of updates to the annotation database. Today, artificial intelligence (AI), particularly deep learning, is widely used in a variety of biological contexts, changing the healthcare system. AI has demonstrated promising capabilities in boosting variant calling precision, refining variant prediction, and enhancing the user-friendliness of electronic health record (EHR) systems in NGS-based diagnostics. This paper reviews the state of the art of AI in NGS-based genetics, and its future directions and challenges. It also compare several rare disease databases.
Collapse
|
2
|
A review of genetic variant databases and machine learning tools for predicting the pathogenicity of breast cancer. Brief Bioinform 2023; 25:bbad479. [PMID: 38149678 PMCID: PMC10782903 DOI: 10.1093/bib/bbad479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 09/22/2023] [Accepted: 12/04/2023] [Indexed: 12/28/2023] Open
Abstract
Studies continue to uncover contributing risk factors for breast cancer (BC) development including genetic variants. Advances in machine learning and big data generated from genetic sequencing can now be used for predicting BC pathogenicity. However, it is unclear which tool developed for pathogenicity prediction is most suited for predicting the impact and pathogenicity of variant effects. A significant challenge is to determine the most suitable data source for each tool since different tools can yield different prediction results with different data inputs. To this end, this work reviews genetic variant databases and tools used specifically for the prediction of BC pathogenicity. We provide a description of existing genetic variants databases and, where appropriate, the diseases for which they have been established. Through example, we illustrate how they can be used for prediction of BC pathogenicity and discuss their associated advantages and disadvantages. We conclude that the tools that are specialized by training on multiple diverse datasets from different databases for the same disease have enhanced accuracy and specificity and are thereby more helpful to the clinicians in predicting and diagnosing BC as early as possible.
Collapse
|
3
|
A review of SARS-CoV-2 drug repurposing: databases and machine learning models. Front Pharmacol 2023; 14:1182465. [PMID: 37601065 PMCID: PMC10436567 DOI: 10.3389/fphar.2023.1182465] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 07/06/2023] [Indexed: 08/22/2023] Open
Abstract
The emergence of Severe Acute Respiratory Syndrome Corona Virus 2 (SARS-CoV-2) posed a serious worldwide threat and emphasized the urgency to find efficient solutions to combat the spread of the virus. Drug repurposing has attracted more attention than traditional approaches due to its potential for a time- and cost-effective discovery of new applications for the existing FDA-approved drugs. Given the reported success of machine learning (ML) in virtual drug screening, it is warranted as a promising approach to identify potential SARS-CoV-2 inhibitors. The implementation of ML in drug repurposing requires the presence of reliable digital databases for the extraction of the data of interest. Numerous databases archive research data from studies so that it can be used for different purposes. This article reviews two aspects: the frequently used databases in ML-based drug repurposing studies for SARS-CoV-2, and the recent ML models that have been developed for the prospective prediction of potential inhibitors against the new virus. Both types of ML models, Deep Learning models and conventional ML models, are reviewed in terms of introduction, methodology, and its recent applications in the prospective predictions of SARS-CoV-2 inhibitors. Furthermore, the features and limitations of the databases are provided to guide researchers in choosing suitable databases according to their research interests.
Collapse
|
4
|
Artificial Bee Colony algorithm in estimating kinetic parameters for yeast fermentation pathway. J Integr Bioinform 2023:jib-2022-0051. [PMID: 37341516 PMCID: PMC10389048 DOI: 10.1515/jib-2022-0051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 05/15/2023] [Indexed: 06/22/2023] Open
Abstract
Analyzing metabolic pathways in systems biology requires accurate kinetic parameters that represent the simulated in vivo processes. Simulation of the fermentation pathway in the Saccharomyces cerevisiae kinetic model help saves much time in the optimization process. Fitting the simulated model into the experimental data is categorized under the parameter estimation problem. Parameter estimation is conducted to obtain the optimal values for parameters related to the fermentation process. This step is essential because insufficient identification of model parameters can cause erroneous conclusions. The kinetic parameters cannot be measured directly. Therefore, they must be estimated from the experimental data either in vitro or in vivo. Parameter estimation is a challenging task in the biological process due to the complexity and nonlinearity of the model. Therefore, we propose the Artificial Bee Colony algorithm (ABC) to estimate the parameters in the fermentation pathway of S. cerevisiae to obtain more accurate values. A metabolite with a total of six parameters is involved in this article. The experimental results show that ABC outperforms other estimation algorithms and gives more accurate kinetic parameter values for the simulated model. Most of the estimated kinetic parameter values obtained from the proposed algorithm are the closest to the experimental data.
Collapse
|
5
|
Recent Advancements and Challenges of AIoT Application in Smart Agriculture: A Review. SENSORS (BASEL, SWITZERLAND) 2023; 23:3752. [PMID: 37050812 PMCID: PMC10098529 DOI: 10.3390/s23073752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Revised: 03/10/2023] [Accepted: 03/28/2023] [Indexed: 06/19/2023]
Abstract
As the most popular technologies of the 21st century, artificial intelligence (AI) and the internet of things (IoT) are the most effective paradigms that have played a vital role in transforming the agricultural industry during the pandemic. The convergence of AI and IoT has sparked a recent wave of interest in artificial intelligence of things (AIoT). An IoT system provides data flow to AI techniques for data integration and interpretation as well as for the performance of automatic image analysis and data prediction. The adoption of AIoT technology significantly transforms the traditional agriculture scenario by addressing numerous challenges, including pest management and post-harvest management issues. Although AIoT is an essential driving force for smart agriculture, there are still some barriers that must be overcome. In this paper, a systematic literature review of AIoT is presented to highlight the current progress, its applications, and its advantages. The AIoT concept, from smart devices in IoT systems to the adoption of AI techniques, is discussed. The increasing trend in article publication regarding to AIoT topics is presented based on a database search process. Lastly, the challenges to the adoption of AIoT technology in modern agriculture are also discussed.
Collapse
|
6
|
A hybrid of Bees algorithm and regulatory on/off minimization for optimizing lactate and succinate production. J Integr Bioinform 2022; 19:jib-2022-0003. [PMID: 35852123 PMCID: PMC9521821 DOI: 10.1515/jib-2022-0003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Accepted: 05/26/2022] [Indexed: 12/03/2022] Open
Abstract
Metabolic engineering has expanded in importance and employment in recent years and is now extensively applied particularly in the production of biomass from microbes. Metabolic network models have been employed extravagantly in computational processes developed to enhance metabolic production and suggest changes in organisms. The crucial issue has been the unrealistic flux distribution presented in prior work on rational modelling framework adopting Optknock and OptGene. In order to address the problem, a hybrid of Bees Algorithm and Regulatory On/Off Minimization (BAROOM) is used. By employing Escherichia coli as the model organism, the most excellent set of genes in E. coli that can be removed and advance the production of succinate can be decided. Evidences shows that BAROOM outperforms alternative strategies used to escalate in succinate production in model organisms like E. coli by selecting the best set of genes to be removed.
Collapse
|
7
|
Enhanced Directed Random Walk for the Identification of Breast Cancer Prognostic Markers from Multiclass Expression Data. ENTROPY (BASEL, SWITZERLAND) 2021; 23:1232. [PMID: 34573857 PMCID: PMC8472068 DOI: 10.3390/e23091232] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Revised: 09/14/2021] [Accepted: 09/16/2021] [Indexed: 12/12/2022]
Abstract
Artificial intelligence in healthcare can potentially identify the probability of contracting a particular disease more accurately. There are five common molecular subtypes of breast cancer: luminal A, luminal B, basal, ERBB2, and normal-like. Previous investigations showed that pathway-based microarray analysis could help in the identification of prognostic markers from gene expressions. For example, directed random walk (DRW) can infer a greater reproducibility power of the pathway activity between two classes of samples with a higher classification accuracy. However, most of the existing methods (including DRW) ignored the characteristics of different cancer subtypes and considered all of the pathways to contribute equally to the analysis. Therefore, an enhanced DRW (eDRW+) is proposed to identify breast cancer prognostic markers from multiclass expression data. An improved weight strategy using one-way ANOVA (F-test) and pathway selection based on the greatest reproducibility power is proposed in eDRW+. The experimental results show that the eDRW+ exceeds other methods in terms of AUC. Besides this, the eDRW+ identifies 294 gene markers and 45 pathway markers from the breast cancer datasets with better AUC. Therefore, the prognostic markers (pathway markers and gene markers) can identify drug targets and look for cancer subtypes with clinically distinct outcomes.
Collapse
|
8
|
SLC17A3 rs9379800 and Ischemic Stroke Susceptibility at the Northern Region of Malaysia. J Stroke Cerebrovasc Dis 2021; 30:105908. [PMID: 34384670 DOI: 10.1016/j.jstrokecerebrovasdis.2021.105908] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Revised: 04/24/2021] [Accepted: 05/19/2021] [Indexed: 10/20/2022] Open
Abstract
OBJECTIVES The relationships of Paired Like Homeodomain 2 (PITX2), Ninjurin 2 (NINJ2), TWIST-Related Protein 1 (TWIST1), Ras Interacting Protein 1 (Rasip1), Solute Carrier Family 17 Member 3 (SLC17A3), Methylmalonyl Co-A Mutase (MUT) and Fer3 Like BHLH Transcription Factor (FERD3L) polymorphisms and gene expression with ischemic stroke have yet to be determined in Malaysia. Hence, this study aimed to explore the associations of single nucleotide polymorphisms (SNPs) and gene expression with ischemic stroke risk among population who resided at the Northern region of Malaysia. MATERIALS AND METHODS Study subjects including 216 ischemic stroke patients and 203 healthy controls were recruited upon obtaining ethical clearance. SNP genotyping was performed using polymerase chain reaction-restriction fragment length polymorphism assays. Gene expression levels were quantified by real-time polymerase chain reaction assays. Statistical and genetic analyses were conducted with SPSS version 22.2, PLINK version 1.07 and multifactor dimensionality reduction software. RESULTS Study subjects with G allele, CG or GG genotypes of SLC17A3 rs9379800 demonstrated increased risk of ischemic stroke with the odds ratios ranging from 1.76-fold to 3.14-fold (p<0.05). When stratified study subjects according to the ethnicity, SLC17A3 rs9379800 G allele and CG genotype contributed to 2.14- and 2.96-fold of ischemic stroke risk among Malay population significantly, in the multivariate analysis (p<0.05). However, no significant associations were observed for PITX2, NINJ2, TWIST1, Rasip1, and MUT polymorphisms with ischemic stroke risk in the multivariate analysis for the pooled cases and controls as well as when stratified them according to the ethnicity. Lower mRNA expression levels of Rasip1, SLC17A3, MUT and FERD3L were observed among cases (p<0.05). After FDR adjustment, the mRNA level of SLC17A3 remained significantly associated with ischemic stroke among Malay population (q=0.034). CONCLUSION In conclusion, this study suggests that SLC17A3 rs9379800 polymorphism and its gene expression contribute to significant ischemic stroke risk among Malaysian population, particularly the Malay who resided at the Northern Region of the country. Our findings can provide useful information for the future diagnosis, management and treatment of ischemic stroke patients.
Collapse
|
9
|
In silico gene knockout prediction using a hybrid of Bat algorithm and minimization of metabolic adjustment. J Integr Bioinform 2021; 18:jib-2020-0037. [PMID: 34348418 PMCID: PMC8573224 DOI: 10.1515/jib-2020-0037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 06/21/2021] [Indexed: 11/17/2022] Open
Abstract
Microorganisms commonly produce many high-demand industrial products like fuels, food, vitamins, and other chemicals. Microbial strains are the strains of microorganisms, which can be optimized to improve their technological properties through metabolic engineering. Metabolic engineering is the process of overcoming cellular regulation in order to achieve a desired product or to generate a new product that the host cells do not usually need to produce. The prediction of genetic manipulations such as gene knockout is part of metabolic engineering. Gene knockout can be used to optimize the microbial strains, such as to maximize the production rate of chemicals of interest. Metabolic and genetic engineering is important in producing the chemicals of interest as, without them, the product yields of many microorganisms are normally low. As a result, the aim of this paper is to propose a combination of the Bat algorithm and the minimization of metabolic adjustment (BATMOMA) to predict which genes to knock out in order to increase the succinate and lactate production rates in Escherichia coli (E. coli).
Collapse
|
10
|
Supervised and Unsupervised Machine Learning for Cancer Classification: Recent Development. 2021 IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC CONTROL & INTELLIGENT SYSTEMS (I2CACIS) 2021. [DOI: 10.1109/i2cacis52118.2021.9495888] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
11
|
Deepint.net: A Rapid Deployment Platform for Smart Territories. SENSORS 2021; 21:s21010236. [PMID: 33401468 PMCID: PMC7795292 DOI: 10.3390/s21010236] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Revised: 12/24/2020] [Accepted: 12/28/2020] [Indexed: 11/27/2022]
Abstract
This paper presents an efficient cyberphysical platform for the smart management of smart territories. It is efficient because it facilitates the implementation of data acquisition and data management methods, as well as data representation and dashboard configuration. The platform allows for the use of any type of data source, ranging from the measurements of a multi-functional IoT sensing devices to relational and non-relational databases. It is also smart because it incorporates a complete artificial intelligence suit for data analysis; it includes techniques for data classification, clustering, forecasting, optimization, visualization, etc. It is also compatible with the edge computing concept, allowing for the distribution of intelligence and the use of intelligent sensors. The concept of smart cities is evolving and adapting to new applications; the trend to create intelligent neighbourhoods, districts or territories is becoming increasingly popular, as opposed to the previous approach of managing an entire megacity. In this paper, the platform is presented, and its architecture and functionalities are described. Moreover, its operation has been validated in a case study where the bike renting service of Paris—Vélib’ Métropole has been managed. This platform could enable smart territories to develop adapted knowledge management systems, adapt them to new requirements and to use multiple types of data, and execute efficient computational and artificial intelligence algorithms. The platform optimizes the decisions taken by human experts through explainable artificial intelligence models that obtain data from IoT sensors, databases, the Internet, etc. The global intelligence of the platform could potentially coordinate its decision-making processes with intelligent nodes installed in the edge, which would use the most advanced data processing techniques.
Collapse
|
12
|
Comparison of Optimization-Modelling Methods for Metabolites Production in Escherichia coli. J Integr Bioinform 2020; 17:jib-2019-0073. [PMID: 32374287 PMCID: PMC7734505 DOI: 10.1515/jib-2019-0073] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Accepted: 01/18/2020] [Indexed: 11/15/2022] Open
Abstract
The metabolic network is the reconstruction of the metabolic pathway of an organism that is used to represent the interaction between enzymes and metabolites in genome level. Meanwhile, metabolic engineering is a process that modifies the metabolic network of a cell to increase the production of metabolites. However, the metabolic networks are too complex that cause problem in identifying near-optimal knockout genes/reactions for maximizing the metabolite’s production. Therefore, through constraint-based modelling, various metaheuristic algorithms have been improvised to optimize the desired phenotypes. In this paper, PSOMOMA was compared with CSMOMA and ABCMOMA for maximizing the production of succinic acid in E. coli. Furthermore, the results obtained from PSOMOMA were validated with results from the wet lab experiment.
Collapse
|
13
|
A non-dominated sorting Differential Search Algorithm Flux Balance Analysis (ndsDSAFBA) for in silico multiobjective optimization in identifying reactions knockout. Comput Biol Med 2019; 113:103390. [PMID: 31450056 DOI: 10.1016/j.compbiomed.2019.103390] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 08/15/2019] [Accepted: 08/15/2019] [Indexed: 01/06/2023]
Abstract
Metabolic engineering is defined as improving the cellular activities of an organism by manipulating the metabolic, signal or regulatory network. In silico reaction knockout simulation is one of the techniques applied to analyse the effects of genetic perturbations on metabolite production. Many methods consider growth coupling as the objective function, whereby it searches for mutants that maximise the growth and production rate. However, the final goal is to increase the production rate. Furthermore, they produce one single solution, though in reality, cells do not focus on one objective and they need to consider various different competing objectives. In this work, a method, termed ndsDSAFBA (non-dominated sorting Differential Search Algorithm and Flux Balance Analysis), has been developed to find the reaction knockouts involved in maximising the production rate and growth rate of the mutant, by incorporating Pareto dominance concepts. The proposed ndsDSAFBA method was validated using three genome-scale metabolic models. We obtained a set of non-dominated solutions, with each solution representing a different mutant strain. The results obtained were compared with the single objective optimisation (SOO) and multi-objective optimisation (MOO) methods. The results demonstrate that ndsDSAFBA is better than the other methods in terms of production rate and growth rate.
Collapse
|
14
|
|
15
|
An Improved Scatter Search Algorithm for Parameter Estimation in Large-Scale Kinetic Models of Biochemical Systems. CURR PROTEOMICS 2019. [DOI: 10.2174/1570164616666190401203128] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Mathematical models play a central role in facilitating researchers to better
understand and comprehensively analyze various processes in biochemical systems. Their usage is
beneficial in metabolic engineering as they help predict and improve desired products. However, one
of the primary challenges in model building is parameter estimation. It is the process to find nearoptimal
values of kinetic parameters which may culminate in the best fit of model prediction to experimental
data.
Methods:
This paper proposes an improved scatter search algorithm to address the challenging parameter
estimation problem. The improved algorithm is based on hybridization of quasi opposition-based
learning in enhanced scatter search (QOBLESS) method. The algorithm is tested using a large-scale
metabolic model of Chinese Hamster Ovary (CHO) cells.
Results:
The experimental result shows that the proposed algorithm performs better than other algorithms
in terms of convergence speed and the minimum value of the objective function (loglikelihood).
The estimated parameters from the experiment produce a better model by means of obtaining
a reasonable good fit of model prediction to the experimental data.
Conclusion:
The kinetic parameters’ value obtained from our work was able to result in a reasonable
best fit of model prediction to the experimental data, which contributes to a better understanding and
produced more accurate model. Based on the results, the QOBLESS method can be used as an efficient
parameter estimation method in large-scale kinetic model building.
Collapse
|
16
|
Topologically significant directed random walk with applied walker network in cancer environment. PAKISTAN JOURNAL OF PHARMACEUTICAL SCIENCES 2019; 32:1395-1408. [PMID: 31551221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Numerous cancer studies have combined different datasets for the prognosis of patients. This study incorporated four networks for significant directed random walk (sDRW) to predict cancerous genes and risk pathways. The study investigated the feasibility of cancer prediction via different networks. In this study, multiple micro array data were analysed and used in the experiment. Six gene expression datasets were applied in four networks to study the effectiveness of the networks in sDRW in terms of cancer prediction. The experimental results showed that one of the proposed networks is outstanding compared to other networks. The network is then proposed to be implemented in sDRW as a walker network. This study provides a foundation for further studies and research on other networks. We hope these finding will improve the prognostic methods of cancer patients.
Collapse
|
17
|
Identifying a Gene Knockout Strategy Using a Hybrid of Simple Constrained Artificial Bee Colony Algorithm and Flux Balance Analysis to Enhance the Production of Succinate and Lactate in Escherichia Coli. Interdiscip Sci 2019; 11:33-44. [DOI: 10.1007/s12539-019-00324-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2018] [Revised: 01/11/2019] [Accepted: 02/04/2019] [Indexed: 11/29/2022]
|
18
|
Abstract
In gene expression studies, missing values are a common problem with important consequences for the interpretation of the final data (Satija et al., Nat Biotechnol 33(5):495, 2015). Numerous bioinformatics examination tools are used for cancer prediction, including the data set matrix (Bailey et al., Cell 173(2):371-385, 2018); thus, it is necessary to resolve the problem of missing-values imputation. This chapter presents a review of the research on missing-values imputation approaches for gene expression data. By using local and global correlation of the data, we were able to focus mostly on the differences between the algorithms. We classified the algorithms as global, hybrid, local, or knowledge-based techniques. Additionally, this chapter presents suitable assessments of the different approaches. The purpose of this review is to focus on developments in the current techniques for scientists rather than applying different or newly developed algorithms with identical functional goals. The aim was to adapt the algorithms to the characteristics of the data.
Collapse
|
19
|
A hybrid of Cuckoo Search and Minimization of Metabolic Adjustment to optimize metabolites production in genome-scale models. Comput Biol Med 2018; 102:112-119. [DOI: 10.1016/j.compbiomed.2018.09.015] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2018] [Revised: 09/16/2018] [Accepted: 09/16/2018] [Indexed: 10/28/2022]
|
20
|
|
21
|
NAHAL-Flex: A Numerical and Alphabetical Hinge Detection Algorithm for Flexible Protein Structure Alignment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:934-943. [PMID: 28534783 DOI: 10.1109/tcbb.2017.2705080] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Flexible proteins are proteins that have conformational changes in their structures. Protein flexibility analysis is critical for classifying and understanding protein functionality. For that analysis, the hinge areas where proteins show flexibility must be detected. To detect the location of the hinges, previous methods have utilized the three-dimensional (3D) structure of proteins, which is highly computational. To reduce the computational complexity, this study proposes a novel text-based method using structural alphabets (SAs) for detecting the hinge position, called NAHAL-Flex. Protein structures were encoded to a particular type of SA called the protein folding shape code (PFSC), which remains unaffected by location, scale, and rotation. The flexible regions of the proteins are the only places in which letter sequences can be distorted. With this knowledge, it is possible to find the longest alignment path of two letter sequences using a dynamic programming (DP) algorithm. Then, the proposed method looks for regions where the alphabet sequence is distorted to find the most probable hinge positions. In order to reduce the number of hinge positions, a genetic algorithm (GA) was utilized to find the best candidate hinge points. To evaluate the method's effectiveness, four different flexible and rigid protein databases, including two small datasets and two large datasets, were utilized. For the small dataset, the NAHAL-Flex method was comparable to state-of-the-art structural flexible alignment methods. The result for the large datasets show that NAHAL-Flex outperforms some well-known alignment methods, e.g., DaliLite, Matt, DeepAlign, and TM-align; the speed of NAHAL-Flex was faster and its result was more accurate than the other methods.
Collapse
|
22
|
An enhanced topologically significant directed random walk in cancer classification using gene expression datasets. Saudi J Biol Sci 2018; 24:1828-1841. [PMID: 29551932 PMCID: PMC5851940 DOI: 10.1016/j.sjbs.2017.11.024] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Revised: 11/08/2017] [Accepted: 11/09/2017] [Indexed: 02/07/2023] Open
Abstract
Microarray technology has become one of the elementary tools for researchers to study the genome of organisms. As the complexity and heterogeneity of cancer is being increasingly appreciated through genomic analysis, cancerous classification is an emerging important trend. Significant directed random walk is proposed as one of the cancerous classification approach which have higher sensitivity of risk gene prediction and higher accuracy of cancer classification. In this paper, the methodology and material used for the experiment are presented. Tuning parameter selection method and weight as parameter are applied in proposed approach. Gene expression dataset is used as the input datasets while pathway dataset is used to build a directed graph, as reference datasets, to complete the bias process in random walk approach. In addition, we demonstrate that our approach can improve sensitive predictions with higher accuracy and biological meaningful classification result. Comparison result takes place between significant directed random walk and directed random walk to show the improvement in term of sensitivity of prediction and accuracy of cancer classification.
Collapse
|
23
|
Pathway-based Analysis with Support Vector Machine (SVM-LASSO) for Gene Selection and Classification. ACTA ACUST UNITED AC 2017. [DOI: 10.18517/ijaseit.7.4-2.3397] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
24
|
An improved hybrid of particle swarm optimization and the gravitational search algorithm to produce a kinetic parameter estimation of aspartate biochemical pathways. Biosystems 2017; 162:81-89. [PMID: 28951204 DOI: 10.1016/j.biosystems.2017.09.013] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2016] [Revised: 06/23/2017] [Accepted: 09/21/2017] [Indexed: 11/17/2022]
Abstract
Mathematical modelling is fundamental to understand the dynamic behavior and regulation of the biochemical metabolisms and pathways that are found in biological systems. Pathways are used to describe complex processes that involve many parameters. It is important to have an accurate and complete set of parameters that describe the characteristics of a given model. However, measuring these parameters is typically difficult and even impossible in some cases. Furthermore, the experimental data are often incomplete and also suffer from experimental noise. These shortcomings make it challenging to identify the best-fit parameters that can represent the actual biological processes involved in biological systems. Computational approaches are required to estimate these parameters. The estimation is converted into multimodal optimization problems that require a global optimization algorithm that can avoid local solutions. These local solutions can lead to a bad fit when calibrating with a model. Although the model itself can potentially match a set of experimental data, a high-performance estimation algorithm is required to improve the quality of the solutions. This paper describes an improved hybrid of particle swarm optimization and the gravitational search algorithm (IPSOGSA) to improve the efficiency of a global optimum (the best set of kinetic parameter values) search. The findings suggest that the proposed algorithm is capable of narrowing down the search space by exploiting the feasible solution areas. Hence, the proposed algorithm is able to achieve a near-optimal set of parameters at a fast convergence speed. The proposed algorithm was tested and evaluated based on two aspartate pathways that were obtained from the BioModels Database. The results show that the proposed algorithm outperformed other standard optimization algorithms in terms of accuracy and near-optimal kinetic parameter estimation. Nevertheless, the proposed algorithm is only expected to work well in small scale systems. In addition, the results of this study can be used to estimate kinetic parameter values in the stage of model selection for different experimental conditions.
Collapse
|
25
|
Metaheuristic Optimization for Parameter Estimation in Kinetic Models of Biological Systems - Recent Development and Future Direction. Curr Bioinform 2017. [DOI: 10.2174/1574893611666161018142809] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
26
|
Samira-VP: A simple protein alignment method with rechecking the alphabet vector positions. J Bioinform Comput Biol 2017; 15:1750004. [PMID: 28274174 DOI: 10.1142/s0219720017500044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Protein structure alignment and comparisons that are based on an alphabetical demonstration of protein structure are more simple to run with faster evaluation processes; thus, their accuracy is not as reliable as three-dimension (3D)-based tools. As a 1D method candidate, TS-AMIR used the alphabetic demonstration of secondary-structure elements (SSE) of proteins and compared the assigned letters to each SSE using the [Formula: see text]-gram method. Although the results were comparable to those obtained via geometrical methods, the SSE length and accuracy of adjacency between SSEs were not considered in the comparison process. Therefore, to obtain further information on accuracy of adjacency between SSE vectors, the new approach of assigning text to vectors was adopted according to the spherical coordinate system in the present study. Moreover, dynamic programming was applied in order to account for the length of SSE vectors. Five common datasets were selected for method evaluation. The first three datasets were small, but difficult to align, and the remaining two datasets were used to compare the capability of the proposed method with that of other methods on a large protein dataset. The results showed that the proposed method, as a text-based alignment approach, obtained results comparable to both 1D and 3D methods. It outperformed 1D methods in terms of accuracy and 3D methods in terms of runtime.
Collapse
|
27
|
Identification of informative genes and pathways using an improved penalized support vector machine with a weighting scheme. Comput Biol Med 2016; 77:102-15. [PMID: 27522238 DOI: 10.1016/j.compbiomed.2016.08.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2016] [Revised: 08/03/2016] [Accepted: 08/03/2016] [Indexed: 01/03/2023]
Abstract
Incorporation of pathway knowledge into microarray analysis has brought better biological interpretation of the analysis outcome. However, most pathway data are manually curated without specific biological context. Non-informative genes could be included when the pathway data is used for analysis of context specific data like cancer microarray data. Therefore, efficient identification of informative genes is inevitable. Embedded methods like penalized classifiers have been used for microarray analysis due to their embedded gene selection. This paper proposes an improved penalized support vector machine with absolute t-test weighting scheme to identify informative genes and pathways. Experiments are done on four microarray data sets. The results are compared with previous methods using 10-fold cross validation in terms of accuracy, sensitivity, specificity and F-score. Our method shows consistent improvement over the previous methods and biological validation has been done to elucidate the relation of the selected genes and pathway with the phenotype under study.
Collapse
|
28
|
A Review of Gene Knockout Strategies for Microbial Cells. Recent Pat Biotechnol 2016; 9:176-97. [PMID: 27185502 DOI: 10.2174/1872208310666160517115047] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Revised: 05/07/2016] [Accepted: 05/10/2016] [Indexed: 11/22/2022]
Abstract
BACKGROUND Predicting the effects of genetic modification is difficult due to the complexity of metabolic net- works. Various gene knockout strategies have been utilised to deactivate specific genes in order to determine the effects of these genes on the function of microbes. Deactivation of genes can lead to deletion of certain proteins and functions. Through these strategies, the associated function of a deleted gene can be identified from the metabolic networks. METHODS The main aim of this paper is to review the available techniques in gene knockout strategies for microbial cells. The review is done in terms of their methodology, recent applications in microbial cells. In addition, the advantages and disadvantages of the techniques are compared and discuss and the related patents are also listed as well. RESULTS Traditionally, gene knockout is done through wet lab (in vivo) techniques, which were conducted through laboratory experiments. However, these techniques are costly and time consuming. Hence, various dry lab (in silico) techniques, where are conducted using computational approaches, have been developed to surmount these problem. CONCLUSION The development of numerous techniques for gene knockout in microbial cells has brought many advancements in the study of gene functions. Based on the literatures, we found that the gene knockout strategies currently used are sensibly implemented with regard to their benefits.
Collapse
|
29
|
A Review on Metabolic Pathway Analysis in Biological Production. MINI-REV ORG CHEM 2015. [DOI: 10.2174/1570193x13666151218191358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
30
|
A Review on the Bioinformatics Tools for Neuroimaging. Malays J Med Sci 2015; 22:9-19. [PMID: 27006633 PMCID: PMC4795522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2015] [Accepted: 11/03/2015] [Indexed: 06/05/2023] Open
Abstract
Neuroimaging is a new technique used to create images of the structure and function of the nervous system in the human brain. Currently, it is crucial in scientific fields. Neuroimaging data are becoming of more interest among the circle of neuroimaging experts. Therefore, it is necessary to develop a large amount of neuroimaging tools. This paper gives an overview of the tools that have been used to image the structure and function of the nervous system. This information can help developers, experts, and users gain insight and a better understanding of the neuroimaging tools available, enabling better decision making in choosing tools of particular research interest. Sources, links, and descriptions of the application of each tool are provided in this paper as well. Lastly, this paper presents the language implemented, system requirements, strengths, and weaknesses of the tools that have been widely used to image the structure and function of the nervous system.
Collapse
|
31
|
Metabolites production improvement by identifying minimal genomes and essential genes using flux balance analysis. INT J DATA MIN BIOIN 2015; 12:85-99. [PMID: 26489144 DOI: 10.1504/ijdmb.2015.068955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
With the advancement in metabolic engineering technologies, reconstruction of the genome of host organisms to achieve desired phenotypes can be made. However, due to the complexity and size of the genome scale metabolic network, significant components tend to be invisible. We proposed an approach to improve metabolite production that consists of two steps. First, we find the essential genes and identify the minimal genome by a single gene deletion process using Flux Balance Analysis (FBA) and second by identifying the significant pathway for the metabolite production using gene expression data. A genome scale model of Saccharomyces cerevisiae for production of vanillin and acetate is used to test this approach. The result has shown the reliability of this approach to find essential genes, reduce genome size and identify production pathway that can further optimise the production yield. The identified genes and pathways can be extendable to other applications especially in strain optimisation.
Collapse
|
32
|
A newton cooperative genetic algorithm method for in silico optimization of metabolic pathway production. PLoS One 2015; 10:e0126199. [PMID: 25961295 PMCID: PMC4427276 DOI: 10.1371/journal.pone.0126199] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2014] [Accepted: 03/28/2015] [Indexed: 11/25/2022] Open
Abstract
This paper presents an in silico optimization method of metabolic pathway production. The metabolic pathway can be represented by a mathematical model known as the generalized mass action model, which leads to a complex nonlinear equations system. The optimization process becomes difficult when steady state and the constraints of the components in the metabolic pathway are involved. To deal with this situation, this paper presents an in silico optimization method, namely the Newton Cooperative Genetic Algorithm (NCGA). The NCGA used Newton method in dealing with the metabolic pathway, and then integrated genetic algorithm and cooperative co-evolutionary algorithm. The proposed method was experimentally applied on the benchmark metabolic pathways, and the results showed that the NCGA achieved better results compared to the existing methods.
Collapse
|
33
|
A hybrid of bees algorithm and flux balance analysis (BAFBA) for the optimisation of microbial strains. INT J DATA MIN BIOIN 2015; 10:225-38. [PMID: 25796740 DOI: 10.1504/ijdmb.2014.064016] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The development of microbial production system has become popular in recent years as microbial hosts offer a number of unique advantages for both native and heterologous small-molecules. However, the main drawback is low yield or productivity of the desired products. Optimisation algorithms are implemented in previous works to identify the effects of gene knockout. Nevertheless, the previous works faced performance issue. Thus, a hybrid of Bees Algorithm and Flux Balance Analysis (BAFBA) is proposed in this paper to improve the performance in predicting optimal sets of gene deletion for maximising the growth rate and production yield of certain metabolite. This paper involves two datasets which are E. coli and S. cerevisiae. The list of knockout genes, growth rate and production yield after the deletion are the results from the experiments. BAFBA presents better results compared to the other methods and the identified list may be useful in solving genetic engineering problems.
Collapse
|
34
|
A Review on Bioinformatics Enrichment Analysis Tools Towards Functional Analysis of High Throughput Gene Set Data. CURR PROTEOMICS 2015. [DOI: 10.2174/157016461201150506200927] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
35
|
Differential Bees Flux Balance Analysis with OptKnock for in silico microbial strains optimization. PLoS One 2014; 9:e102744. [PMID: 25047076 PMCID: PMC4105462 DOI: 10.1371/journal.pone.0102744] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2014] [Accepted: 06/23/2014] [Indexed: 01/16/2023] Open
Abstract
Microbial strains optimization for the overproduction of desired phenotype has been a popular topic in recent years. The strains can be optimized through several techniques in the field of genetic engineering. Gene knockout is a genetic engineering technique that can engineer the metabolism of microbial cells with the objective to obtain desirable phenotypes. However, the complexities of the metabolic networks have made the process to identify the effects of genetic modification on the desirable phenotypes challenging. Furthermore, a vast number of reactions in cellular metabolism often lead to the combinatorial problem in obtaining optimal gene deletion strategy. Basically, the size of a genome-scale metabolic model is usually large. As the size of the problem increases, the computation time increases exponentially. In this paper, we propose Differential Bees Flux Balance Analysis (DBFBA) with OptKnock to identify optimal gene knockout strategies for maximizing the production yield of desired phenotypes while sustaining the growth rate. This proposed method functions by improving the performance of a hybrid of Bees Algorithm and Flux Balance Analysis (BAFBA) by hybridizing Differential Evolution (DE) algorithm into neighborhood searching strategy of BAFBA. In addition, DBFBA is integrated with OptKnock to validate the results for improving the reliability the work. Through several experiments conducted on Escherichia coli, Bacillus subtilis, and Clostridium thermocellum as the model organisms, DBFBA has shown a better performance in terms of computational time, stability, growth rate, and production yield of desired phenotypes compared to the methods used in previous works.
Collapse
|
36
|
A synchronous-asynchronous particle swarm optimisation algorithm. ScientificWorldJournal 2014; 2014:123019. [PMID: 25121109 PMCID: PMC4121262 DOI: 10.1155/2014/123019] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Accepted: 06/20/2014] [Indexed: 11/18/2022] Open
Abstract
In the original particle swarm optimisation (PSO) algorithm, the particles' velocities and positions are updated after the whole swarm performance is evaluated. This algorithm is also known as synchronous PSO (S-PSO). The strength of this update method is in the exploitation of the information. Asynchronous update PSO (A-PSO) has been proposed as an alternative to S-PSO. A particle in A-PSO updates its velocity and position as soon as its own performance has been evaluated. Hence, particles are updated using partial information, leading to stronger exploration. In this paper, we attempt to improve PSO by merging both update methods to utilise the strengths of both methods. The proposed synchronous-asynchronous PSO (SA-PSO) algorithm divides the particles into smaller groups. The best member of a group and the swarm's best are chosen to lead the search. Members within a group are updated synchronously, while the groups themselves are asynchronously updated. Five well-known unimodal functions, four multimodal functions, and a real world optimisation problem are used to study the performance of SA-PSO, which is compared with the performances of S-PSO and A-PSO. The results are statistically analysed and show that the proposed SA-PSO has performed consistently well.
Collapse
|
37
|
A hybrid of ant colony optimization and minimization of metabolic adjustment to improve the production of succinic acid in Escherichia coli. Comput Biol Med 2014; 49:74-82. [DOI: 10.1016/j.compbiomed.2014.03.011] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2013] [Revised: 03/05/2014] [Accepted: 03/26/2014] [Indexed: 11/24/2022]
|
38
|
A review on the computational approaches for gene regulatory network construction. Comput Biol Med 2014; 48:55-65. [PMID: 24637147 DOI: 10.1016/j.compbiomed.2014.02.011] [Citation(s) in RCA: 148] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2013] [Revised: 02/14/2014] [Accepted: 02/17/2014] [Indexed: 01/08/2023]
Abstract
Many biological research areas such as drug design require gene regulatory networks to provide clear insight and understanding of the cellular process in living cells. This is because interactions among the genes and their products play an important role in many molecular processes. A gene regulatory network can act as a blueprint for the researchers to observe the relationships among genes. Due to its importance, several computational approaches have been proposed to infer gene regulatory networks from gene expression data. In this review, six inference approaches are discussed: Boolean network, probabilistic Boolean network, ordinary differential equation, neural network, Bayesian network, and dynamic Bayesian network. These approaches are discussed in terms of introduction, methodology and recent applications of these approaches in gene regulatory network construction. These approaches are also compared in the discussion section. Furthermore, the strengths and weaknesses of these computational approaches are described.
Collapse
|
39
|
An improved differential evolution algorithm for enhancing biochemical pathways simulation and production. INT J DATA MIN BIOIN 2014; 10:424-39. [PMID: 25946887 DOI: 10.1504/ijdmb.2014.064893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
This paper presents an Improved Differential Evolution (IDE) algorithm to improve the kinetic parameter estimation in simulating the glycolysis pathway and the threonine biosynthesis pathway. Experimentally derived time series kinetic data are noisy and possess many unknown parameters. These characteristics of kinetic data cause lengthy computational time to compute the optimum value of the kinetic parameters. To solve this problem, this study had been conducted to develop a hybrid method that combined the Differential Evolution algorithm (DE) and the Kalman Filter (KF) to produce IDE. Results have shown that lesser computation time (6% and 18.5% faster) and more robust to noisy data with significant reduced error rates (93% and 79% reduced error rates) compared with the Genetic Algorithm (GA) and DE, respectively, in glycolysis and threonine biosynthesis pathway simulations. IDE is reliable as it demonstrated consistent standard deviation values which were close to mean values. We foresee the applicability of IDE into other metabolic pathway simulations.
Collapse
|
40
|
Experimental study on the cooling performance of high power LED arrays under natural convection. ACTA ACUST UNITED AC 2013. [DOI: 10.1088/1757-899x/50/1/012030] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
41
|
Parameter Estimation By Using An Improved Bee Memory Differential Evolution Algorithm (Ibmde) To Simulate Biochemical Pathways. Curr Bioinform 2013. [DOI: 10.2174/15748936113089990007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
42
|
An enhancement of binary particle swarm optimization for gene selection in classifying cancer classes. Algorithms Mol Biol 2013; 8:15. [PMID: 23617960 PMCID: PMC3847130 DOI: 10.1186/1748-7188-8-15] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2012] [Accepted: 04/16/2013] [Indexed: 11/10/2022] Open
Abstract
Background Gene expression data could likely be a momentous help in the progress of proficient cancer diagnoses and classification platforms. Lately, many researchers analyze gene expression data using diverse computational intelligence methods, for selecting a small subset of informative genes from the data for cancer classification. Many computational methods face difficulties in selecting small subsets due to the small number of samples compared to the huge number of genes (high-dimension), irrelevant genes, and noisy genes. Methods We propose an enhanced binary particle swarm optimization to perform the selection of small subsets of informative genes which is significant for cancer classification. Particle speed, rule, and modified sigmoid function are introduced in this proposed method to increase the probability of the bits in a particle’s position to be zero. The method was empirically applied to a suite of ten well-known benchmark gene expression data sets. Results The performance of the proposed method proved to be superior to other previous related works, including the conventional version of binary particle swarm optimization (BPSO) in terms of classification accuracy and the number of selected genes. The proposed method also requires lower computational time compared to BPSO.
Collapse
|
43
|
An improved swarm optimization for parameter estimation and biological model selection. PLoS One 2013; 8:e61258. [PMID: 23593445 PMCID: PMC3623867 DOI: 10.1371/journal.pone.0061258] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2012] [Accepted: 03/11/2013] [Indexed: 11/19/2022] Open
Abstract
One of the key aspects of computational systems biology is the investigation on the dynamic biological processes within cells. Computational models are often required to elucidate the mechanisms and principles driving the processes because of the nonlinearity and complexity. The models usually incorporate a set of parameters that signify the physical properties of the actual biological systems. In most cases, these parameters are estimated by fitting the model outputs with the corresponding experimental data. However, this is a challenging task because the available experimental data are frequently noisy and incomplete. In this paper, a new hybrid optimization method is proposed to estimate these parameters from the noisy and incomplete experimental data. The proposed method, called Swarm-based Chemical Reaction Optimization, integrates the evolutionary searching strategy employed by the Chemical Reaction Optimization, into the neighbouring searching strategy of the Firefly Algorithm method. The effectiveness of the method was evaluated using a simulated nonlinear model and two biological models: synthetic transcriptional oscillators, and extracellular protease production models. The results showed that the accuracy and computational speed of the proposed method were better than the existing Differential Evolution, Firefly Algorithm and Chemical Reaction Optimization methods. The reliability of the estimated parameters was statistically validated, which suggests that the model outputs produced by these parameters were valid even when noisy and incomplete experimental data were used. Additionally, Akaike Information Criterion was employed to evaluate the model selection, which highlighted the capability of the proposed method in choosing a plausible model based on the experimental data. In conclusion, this paper presents the effectiveness of the proposed method for parameter estimation and model selection problems using noisy and incomplete experimental data. This study is hoped to provide a new insight in developing more accurate and reliable biological models based on limited and low quality experimental data.
Collapse
|
44
|
Validation of Hierarchical Gene Clusters Using Repeated Measurements. JURNAL TEKNOLOGI 2013; 61. [DOI: 10.11113/jt.v61.1616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
Hierarchical clustering is an unsupervised technique, which is a common approach to study protein and gene expression data. In clustering, the patterns of expression of different genes are grouped into distinct clusters, in which the genes in the same cluster are assumed potential to be functionally related or to be influenced by a common upstream factor. Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data analysis, the uncertainty in the results obtained is still bothersome. Experimental repetitions are generally performed to overcome the drawbacks of biological variability and technical variability. In this study, the author proposes repeated measurement to evaluate the stability of gene clusters. This paper aims to prove that the stability from the gene clusters, incorporated with repeated measurement, can be used for further analysis.
Collapse
|
45
|
|
46
|
Improved Differential Evolution Algorithm for Parameter Estimation to Improve the Production of Biochemical Pathway. INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE 2012. [DOI: 10.9781/ijimai.2012.153] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
47
|
An improved hybrid of SVM and SCAD for pathway analysis. Bioinformation 2011; 7:169-75. [PMID: 22102773 PMCID: PMC3218518 DOI: 10.6026/97320630007169] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2011] [Accepted: 10/02/2011] [Indexed: 11/23/2022] Open
Abstract
Pathway analysis has lead to a new era in genomic research by providing further biological process information compared to traditional single gene analysis. Beside the advantage, pathway analysis provides some challenges to the researchers, one of which is the quality of pathway data itself. The pathway data usually defined from biological context free, when it comes to a specific biological context (e.g. lung cancer disease), typically only several genes within pathways are responsible for the corresponding cellular process. It also can be that some pathways may be included with uninformative genes or perhaps informative genes were excluded. Moreover, many algorithms in pathway analysis neglect these limitations by treating all the genes within pathways as significant. In previous study, a hybrid of support vector machines and smoothly clipped absolute deviation with groups-specific tuning parameters (gSVM-SCAD) was proposed in order to identify and select the informative genes before the pathway evaluation process. However, gSVM-SCAD had showed a limitation in terms of the performance of classification accuracy. In order to deal with this limitation, we made an enhancement to the tuning parameter method for gSVM-SCAD by applying the B-Type generalized approximate cross validation (BGACV). Experimental analyses using one simulated data and two gene expression data have shown that the proposed method obtains significant results in identifying biologically significant genes and pathways, and in classification accuracy.
Collapse
|
48
|
Random forest for gene selection and microarray data classification. Bioinformation 2011; 7:142-6. [PMID: 22125385 PMCID: PMC3218317 DOI: 10.6026/97320630007142] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2011] [Accepted: 09/21/2011] [Indexed: 11/23/2022] Open
Abstract
A random forest method has been selected to perform both gene selection and classification of the microarray data. In this embedded method, the selection of smallest possible sets of genes with lowest error rates is the key factor in achieving highest classification accuracy. Hence, improved gene selection method using random forest has been proposed to obtain the smallest subset of genes as well as biggest subset of genes prior to classification. The option for biggest subset selection is done to assist researchers who intend to use the informative genes for further research. Enhanced random forest gene selection has performed better in terms of selecting the smallest subset as well as biggest subset of informative genes with lowest out of bag error rates through gene selection. Furthermore, the classification performed on the selected subset of genes using random forest has lead to lower prediction error rates compared to existing method and other similar available methods.
Collapse
|
49
|
A modified binary particle swarm optimization for selecting the small subset of informative genes from gene expression data. ACTA ACUST UNITED AC 2011; 15:813-22. [PMID: 21914573 DOI: 10.1109/titb.2011.2167756] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Gene expression data are expected to be of significant help in the development of efficient cancer diagnoses and classification platforms. In order to select a small subset of informative genes from the data for cancer classification, recently, many researchers are analyzing gene expression data using various computational intelligence methods. However, due to the small number of samples compared to the huge number of genes (high dimension), irrelevant genes, and noisy genes, many of the computational methods face difficulties to select the small subset. Thus, we propose an improved (modified) binary particle swarm optimization to select the small subset of informative genes that is relevant for the cancer classification. In this proposed method, we introduce particles' speed for giving the rate at which a particle changes its position, and we propose a rule for updating particle's positions. By performing experiments on ten different gene expression datasets, we have found that the performance of the proposed method is superior to other previous related works, including the conventional version of binary particle swarm optimization (BPSO) in terms of classification accuracy and the number of selected genes. The proposed method also produces lower running times compared to BPSO.
Collapse
|
50
|
Gene subset selection using an iterative approach based on genetic algorithms. ARTIFICIAL LIFE AND ROBOTICS 2009. [DOI: 10.1007/s10015-009-0711-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|