1
|
Abdullah-Zawawi MR, Govender N, Karim MB, Altaf-Ul-Amin M, Kanaya S, Mohamed-Hussein ZA. Chemoinformatics-driven classification of Angiosperms using sulfur-containing compounds and machine learning algorithm. PLANT METHODS 2022; 18:118. [PMID: 36335358 PMCID: PMC9636760 DOI: 10.1186/s13007-022-00951-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 10/14/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND Phytochemicals or secondary metabolites are low molecular weight organic compounds with little function in plant growth and development. Nevertheless, the metabolite diversity govern not only the phenetics of an organism but may also inform the evolutionary pattern and adaptation of green plants to the changing environment. Plant chemoinformatics analyzes the chemical system of natural products using computational tools and robust mathematical algorithms. It has been a powerful approach for species-level differentiation and is widely employed for species classifications and reinforcement of previous classifications. RESULTS This study attempts to classify Angiosperms using plant sulfur-containing compound (SCC) or sulphated compound information. The SCC dataset of 692 plant species were collected from the comprehensive species-metabolite relationship family (KNApSAck) database. The structural similarity score of metabolite pairs under all possible combinations (plant species-metabolite) were determined and metabolite pairs with a Tanimoto coefficient value > 0.85 were selected for clustering using machine learning algorithm. Metabolite clustering showed association between the similar structural metabolite clusters and metabolite content among the plant species. Phylogenetic tree construction of Angiosperms displayed three major clades, of which, clade 1 and clade 2 represented the eudicots only, and clade 3, a mixture of both eudicots and monocots. The SCC-based construction of Angiosperm phylogeny is a subset of the existing monocot-dicot classification. The majority of eudicots present in clade 1 and 2 were represented by glucosinolate compounds. These clades with SCC may have been a mixture of ancestral species whilst the combinatorial presence of monocot-dicot in clade 3 suggests sulphated-chemical structure diversification in the event of adaptation during evolutionary change. CONCLUSIONS Sulphated chemoinformatics informs classification of Angiosperms via machine learning technique.
Collapse
Affiliation(s)
- Muhammad-Redha Abdullah-Zawawi
- Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia, 43600, UKM Bangi, Malaysia
- UKM Medical Molecular Biology Institute (UMBI), Jalan Yaacob Latif, Bandar Tun Razak, 56000 Cheras, Kuala Lumpur, Malaysia
| | - Nisha Govender
- Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia, 43600, UKM Bangi, Malaysia
| | - Mohammad Bozlul Karim
- Graduate School Information Science, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
| | - Md Altaf-Ul-Amin
- Graduate School Information Science, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
| | - Shigehiko Kanaya
- Graduate School Information Science, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
| | - Zeti-Azura Mohamed-Hussein
- Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia, 43600, UKM Bangi, Malaysia.
- Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600, UKM Bangi, Malaysia.
| |
Collapse
|
2
|
Wijaya SH, Afendi FM, Batubara I, Huang M, Ono N, Kanaya S, Altaf-Ul-Amin M. Identification of Targeted Proteins by Jamu Formulas for Different Efficacies Using Machine Learning Approach. Life (Basel) 2021; 11:866. [PMID: 34440610 PMCID: PMC8398944 DOI: 10.3390/life11080866] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 08/12/2021] [Accepted: 08/18/2021] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND We performed in silico prediction of the interactions between compounds of Jamu herbs and human proteins by utilizing data-intensive science and machine learning methods. Verifying the proteins that are targeted by compounds of natural herbs will be helpful to select natural herb-based drug candidates. METHODS Initially, data related to compounds, target proteins, and interactions between them were collected from open access databases. Compounds are represented by molecular fingerprints, whereas amino acid sequences are represented by numerical protein descriptors. Then, prediction models that predict the interactions between compounds and target proteins were constructed using support vector machine and random forest. RESULTS A random forest model constructed based on MACCS fingerprint and amino acid composition obtained the highest accuracy. We used the best model to predict target proteins for 94 important Jamu compounds and assessed the results by supporting evidence from published literature and other sources. There are 27 compounds that can be validated by professional doctors, and those compounds belong to seven efficacy groups. CONCLUSION By comparing the efficacy of predicted compounds and the relations of the targeted proteins with diseases, we found that some compounds might be considered as drug candidates.
Collapse
Affiliation(s)
- Sony Hartono Wijaya
- Department of Computer Science, IPB University, Kampus IPB Dramaga Wing 20 Level 5, Bogor 16680, Indonesia
- Tropical Biopharmaca Research Center, IPB University, Kampus IPB Taman Kencana, Bogor 16128, Indonesia; (F.M.A.); (I.B.)
| | - Farit Mochamad Afendi
- Tropical Biopharmaca Research Center, IPB University, Kampus IPB Taman Kencana, Bogor 16128, Indonesia; (F.M.A.); (I.B.)
- Department of Statistics, IPB University, Kampus IPB Dramaga Wing 22 Level 4, Bogor 16680, Indonesia
| | - Irmanida Batubara
- Tropical Biopharmaca Research Center, IPB University, Kampus IPB Taman Kencana, Bogor 16128, Indonesia; (F.M.A.); (I.B.)
- Department of Chemistry, IPB University, Kampus IPB Dramaga Wing 1 Level 3, Bogor 16128, Indonesia
| | - Ming Huang
- Computational Systems Biology Laboratory, Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma 630-0192, Nara, Japan; (M.H.); (N.O.); (S.K.)
| | - Naoaki Ono
- Computational Systems Biology Laboratory, Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma 630-0192, Nara, Japan; (M.H.); (N.O.); (S.K.)
| | - Shigehiko Kanaya
- Computational Systems Biology Laboratory, Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma 630-0192, Nara, Japan; (M.H.); (N.O.); (S.K.)
| | - Md. Altaf-Ul-Amin
- Computational Systems Biology Laboratory, Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma 630-0192, Nara, Japan; (M.H.); (N.O.); (S.K.)
| |
Collapse
|
3
|
Hossain SF, Huang M, Ono N, Morita A, Kanaya S, Altaf-Ul-Amin M. Development of a biomarker database toward performing disease classification and finding disease interrelations. Database (Oxford) 2021; 2021:baab011. [PMID: 33705530 PMCID: PMC7951048 DOI: 10.1093/database/baab011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 02/19/2021] [Accepted: 02/25/2021] [Indexed: 12/11/2022]
Abstract
A biomarker is a measurable indicator of a disease or abnormal state of a body that plays an important role in disease diagnosis, prognosis and treatment. The biomarker has become a significant topic due to its versatile usage in the medical field and in rapid detection of the presence or severity of some diseases. The volume of biomarker data is rapidly increasing and the identified data are scattered. To provide comprehensive information, the explosively growing data need to be recorded in a single platform. There is no open-source freely available comprehensive online biomarker database. To fulfill this purpose, we have developed a human biomarker database as part of the KNApSAcK family databases which contain a vast quantity of information on the relationships between biomarkers and diseases. We have classified the diseases into 18 disease classes, mostly according to the National Center for Biotechnology Information definitions. Apart from this database development, we also have performed disease classification by separately using protein and metabolite biomarkers based on the network clustering algorithm DPClusO and hierarchical clustering. Finally, we reached a conclusion about the relationships among the disease classes. The human biomarker database can be accessed online and the inter-disease relationships may be helpful in understanding the molecular mechanisms of diseases. To our knowledge, this is one of the first approaches to classify diseases based on biomarkers. Database URL: http://www.knapsackfamily.com/Biomarker/top.php.
Collapse
Affiliation(s)
- Shaikh Farhad Hossain
- Computational Systems Biology Lab, Graduate School of Science and Technology, Nara Institute of Science and Technology (NAIST), 8916-5, Takayama, Ikoma, Nara 630-0192, Japan
| | - Ming Huang
- Computational Systems Biology Lab, Graduate School of Science and Technology, Nara Institute of Science and Technology (NAIST), 8916-5, Takayama, Ikoma, Nara 630-0192, Japan
| | - Naoaki Ono
- Computational Systems Biology Lab, Graduate School of Science and Technology, Nara Institute of Science and Technology (NAIST), 8916-5, Takayama, Ikoma, Nara 630-0192, Japan
| | - Aki Morita
- Computational Systems Biology Lab, Graduate School of Science and Technology, Nara Institute of Science and Technology (NAIST), 8916-5, Takayama, Ikoma, Nara 630-0192, Japan
| | - Shigehiko Kanaya
- Computational Systems Biology Lab, Graduate School of Science and Technology, Nara Institute of Science and Technology (NAIST), 8916-5, Takayama, Ikoma, Nara 630-0192, Japan
| | - Md Altaf-Ul-Amin
- Computational Systems Biology Lab, Graduate School of Science and Technology, Nara Institute of Science and Technology (NAIST), 8916-5, Takayama, Ikoma, Nara 630-0192, Japan
| |
Collapse
|
4
|
A cloud based knowledge discovery framework, for medicinal plants from PubMed literature. INFORMATICS IN MEDICINE UNLOCKED 2019. [DOI: 10.1016/j.imu.2019.100226] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
5
|
Behera NK, Mahalakshmi G. A cloud based knowledge discovery framework, for medicinal plants from PubMed literature. INFORMATICS IN MEDICINE UNLOCKED 2019. [DOI: 10.1016/j.imu.2018.04.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
|
6
|
Suparmi S, Widiastuti D, Wesseling S, Rietjens IMCM. Natural occurrence of genotoxic and carcinogenic alkenylbenzenes in Indonesian jamu and evaluation of consumer risks. Food Chem Toxicol 2018; 118:53-67. [PMID: 29727721 DOI: 10.1016/j.fct.2018.04.059] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Revised: 04/24/2018] [Accepted: 04/25/2018] [Indexed: 12/15/2022]
Abstract
The consumer risks of jamu, Indonesian traditional herbal medicines, was assessed focussing on the presence of alkenylbenzene containing botanical ingredients. Twenty-three out of 25 samples contained alkenylbenzenes at levels ranging from 3.8 to 440 μg/kg, with methyleugenol being the most frequently encountered alkenylbenzene. The estimated daily intake (EDI) resulting from jamu consumption was estimated to amount to 0.2-171 μg/kg bw/day for individual alkenylbenzenes, to 0.9-203 μg/kg bw/day when adding up all alkenylbenzenes detected, and to 0.9-551 μg/kg bw/day when expressed in methyleugenol equivalents using interim relative potency (REP) factors. The margin of exposure (MOE) values obtained were generally <10,000 indicating a priority for risk management when assuming daily consumption during a lifetime. Using Haber's rule it was estimated that two weeks consumption of these jamu only once would not raise a concern (MOE >10,000). However, when considering use for two weeks every year during a lifetime, 5 samples still raise a concern. It is concluded that the consumption of alkenylbenzene containing jamu can be of concern especially when consumed on a daily basis for longer periods of time on a regular basis.
Collapse
Affiliation(s)
- Suparmi Suparmi
- Division of Toxicology, Wageningen University and Research, Stippeneng 4, 6708 WE, Wageningen, The Netherlands; Department of Biology, Faculty of Medicine, Universitas Islam Sultan Agung, Jl. Raya Kaligawe KM 4, 50112, Semarang, Indonesia.
| | - Diana Widiastuti
- Division of Toxicology, Wageningen University and Research, Stippeneng 4, 6708 WE, Wageningen, The Netherlands; The National Agency for Drug and Food Control (NADFC), Jl. Percetakan Negara No.23, 10560, Jakarta, Indonesia
| | - Sebastiaan Wesseling
- Division of Toxicology, Wageningen University and Research, Stippeneng 4, 6708 WE, Wageningen, The Netherlands
| | - Ivonne M C M Rietjens
- Division of Toxicology, Wageningen University and Research, Stippeneng 4, 6708 WE, Wageningen, The Netherlands
| |
Collapse
|
7
|
Wijaya SH, Batubara I, Nishioka T, Altaf-Ul-Amin M, Kanaya S. Metabolomic Studies of Indonesian Jamu Medicines: Prediction of Jamu Efficacy and Identification of Important Metabolites. Mol Inform 2017; 36. [PMID: 28682479 DOI: 10.1002/minf.201700050] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2017] [Accepted: 06/22/2017] [Indexed: 12/15/2022]
Abstract
In order to obtain a better understanding why some Jamu formulas can be used to treat a specific disease, we performed metabolomic studies of Jamu by taking into consideration the biologically active compounds existing in plants used as Jamu ingredients. A thorough integration of information from omics is expected to provide solid evidence-based scientific rationales for the development of modern phytomedicines. This study focused on prediction of Jamu efficacy based on its component metabolites and also identification of important metabolites related to each efficacy group. Initially, we compared the performance of Support Vector Machines and Random Forest to predict the Jamu efficacy with three different data pre-processing approaches, such as no filtering, Single Filtering algorithm, and a combination of Single Filtering algorithm and feature selection using Regularized Random Forest. Both classifiers performed very well and according to 5-fold cross-validation results, the mean accuracy of Support Vector Machine with linear kernel was slightly better than Random Forest. It can be concluded that machine learning methods can successfully relate Jamu efficacy with metabolites. In addition, we extended our analysis by identifying important metabolites from the Random Forest model. The inTrees framework was used to extract the rules and to select important metabolites for each efficacy group. Overall, we identified 94 significant metabolites associated to 12 efficacy groups and many of them were validated by published literature and KNApSAcK Metabolite Activity database.
Collapse
Affiliation(s)
- Sony Hartono Wijaya
- Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5, Takayama, Ikoma, Nara 630-0192, Japan.,Department of Computer Science, Bogor Agricultural University, Jl. Meranti Wing 20 Level 5 Kampus IPB Dramaga, Bogor, 16680, Indonesia
| | - Irmanida Batubara
- Tropical Biopharmaca Research Center, Bogor Agricultural University, Jl. Taman Kencana No. 3 Kampus IPB Taman Kencana, Bogor, 16128, Indonesia.,Department of Chemistry, Bogor Agricultural University, Jl. Tanjung Wing 1 Level 3 Kampus IPB Dramaga, Bogor, 16680, Indonesia
| | - Takaaki Nishioka
- Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5, Takayama, Ikoma, Nara 630-0192, Japan
| | - Md Altaf-Ul-Amin
- Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5, Takayama, Ikoma, Nara 630-0192, Japan
| | - Shigehiko Kanaya
- Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5, Takayama, Ikoma, Nara 630-0192, Japan
| |
Collapse
|
8
|
Wijaya SH, Afendi FM, Batubara I, Darusman LK, Altaf-Ul-Amin M, Kanaya S. Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines. BMC Bioinformatics 2016; 17:520. [PMID: 27927171 PMCID: PMC5142342 DOI: 10.1186/s12859-016-1392-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2016] [Accepted: 11/29/2016] [Indexed: 12/30/2022] Open
Abstract
Background The binary similarity and dissimilarity measures have critical roles in the processing of data consisting of binary vectors in various fields including bioinformatics and chemometrics. These metrics express the similarity and dissimilarity values between two binary vectors in terms of the positive matches, absence mismatches or negative matches. To our knowledge, there is no published work presenting a systematic way of finding an appropriate equation to measure binary similarity that performs well for certain data type or application. A proper method to select a suitable binary similarity or dissimilarity measure is needed to obtain better classification results. Results In this study, we proposed a novel approach to select binary similarity and dissimilarity measures. We collected 79 binary similarity and dissimilarity equations by extensive literature search and implemented those equations as an R package called bmeasures. We applied these metrics to quantify the similarity and dissimilarity between herbal medicine formulas belonging to the Indonesian Jamu and Japanese Kampo separately. We assessed the capability of binary equations to classify herbal medicine pairs into match and mismatch efficacies based on their similarity or dissimilarity coefficients using the Receiver Operating Characteristic (ROC) curve analysis. According to the area under the ROC curve results, we found Indonesian Jamu and Japanese Kampo datasets obtained different ranking of binary similarity and dissimilarity measures. Out of all the equations, the Forbes-2 similarity and the Variant of Correlation similarity measures are recommended for studying the relationship between Jamu formulas and Kampo formulas, respectively. Conclusions The selection of binary similarity and dissimilarity measures for multivariate analysis is data dependent. The proposed method can be used to find the most suitable binary similarity and dissimilarity equation wisely for a particular data. Our finding suggests that all four types of matching quantities in the Operational Taxonomic Unit (OTU) table are important to calculate the similarity and dissimilarity coefficients between herbal medicine formulas. Also, the binary similarity and dissimilarity measures that include the negative match quantity d achieve better capability to separate herbal medicine pairs compared to equations that exclude d. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1392-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sony Hartono Wijaya
- Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara, 630-0192, Japan.,Department of Computer Science, Bogor Agricultural University, Jl. Meranti Wing 20 Level 5 Kampus IPB Dramaga, Bogor, 16680, Indonesia
| | - Farit Mochamad Afendi
- Department of Statistics, Bogor Agricultural University, Jl. Meranti Wing 22 Level 4 Kampus IPB Dramaga, Bogor, 16680, Indonesia
| | - Irmanida Batubara
- Tropical Biopharmaca Research Center, Bogor Agricultural University, Kampus IPB Taman Kencana, Jl. Taman Kencana No. 3, Bogor, 16128, Indonesia
| | - Latifah K Darusman
- Tropical Biopharmaca Research Center, Bogor Agricultural University, Kampus IPB Taman Kencana, Jl. Taman Kencana No. 3, Bogor, 16128, Indonesia
| | - Md Altaf-Ul-Amin
- Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara, 630-0192, Japan
| | - Shigehiko Kanaya
- Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara, 630-0192, Japan.
| |
Collapse
|
9
|
Utilization of KNApSAcK Family Databases for Developing Herbal Medicine Systems. JOURNAL OF COMPUTER AIDED CHEMISTRY 2016. [DOI: 10.2751/jcac.17.1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
10
|
Development and mining of a volatile organic compound database. BIOMED RESEARCH INTERNATIONAL 2015; 2015:139254. [PMID: 26495281 PMCID: PMC4606137 DOI: 10.1155/2015/139254] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/14/2015] [Accepted: 06/14/2015] [Indexed: 12/16/2022]
Abstract
Volatile organic compounds (VOCs) are small molecules that exhibit high vapor pressure under ambient conditions and have low boiling points. Although VOCs contribute only a small proportion of the total metabolites produced by living organisms, they play an important role in chemical ecology specifically in the biological interactions between organisms and ecosystems. VOCs are also important in the health care field as they are presently used as a biomarker to detect various human diseases. Information on VOCs is scattered in the literature until now; however, there is still no available database describing VOCs and their biological activities. To attain this purpose, we have developed KNApSAcK Metabolite Ecology Database, which contains the information on the relationships between VOCs and their emitting organisms. The KNApSAcK Metabolite Ecology is also linked with the KNApSAcK Core and KNApSAcK Metabolite Activity Database to provide further information on the metabolites and their biological activities. The VOC database can be accessed online.
Collapse
|