Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Rivera-Borroto OM, Marrero-Ponce Y, García-de la Vega JM, Grau-Ábalo RDC. Comparison of Combinatorial Clustering Methods on Pharmacological Data Sets Represented by Machine Learning-Selected Real Molecular Descriptors. J Chem Inf Model 2011;51:3036-49. [DOI: 10.1021/ci2000083] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

For:	Rivera-Borroto OM, Marrero-Ponce Y, García-de la Vega JM, Grau-Ábalo RDC. Comparison of Combinatorial Clustering Methods on Pharmacological Data Sets Represented by Machine Learning-Selected Real Molecular Descriptors. J Chem Inf Model 2011;51:3036-49. [DOI: 10.1021/ci2000083] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Number

Cited by Other Article(s)

Moreira-Filho JT, Ranganath D, Conway M, Schmitt C, Kleinstreuer N, Mansouri K. Democratizing cheminformatics: interpretable chemical grouping using an automated KNIME workflow. J Cheminform 2024;16:101. [PMID: 39152469 PMCID: PMC11330086 DOI: 10.1186/s13321-024-00894-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 08/06/2024] [Indexed: 08/19/2024] Open

Abstract

With the increased availability of chemical data in public databases, innovative techniques and algorithms have emerged for the analysis, exploration, visualization, and extraction of information from these data. One such technique is chemical grouping, where chemicals with common characteristics are categorized into distinct groups based on physicochemical properties, use, biological activity, or a combination. However, existing tools for chemical grouping often require specialized programming skills or the use of commercial software packages. To address these challenges, we developed a user-friendly chemical grouping workflow implemented in KNIME, a free, open-source, low/no-code, data analytics platform. The workflow serves as an all-encompassing tool, expertly incorporating a range of processes such as molecular descriptor calculation, feature selection, dimensionality reduction, hyperparameter search, and supervised and unsupervised machine learning methods, enabling effective chemical grouping and visualization of results. Furthermore, we implemented tools for interpretation, identifying key molecular descriptors for the chemical groups, and using natural language summaries to clarify the rationale behind these groupings. The workflow was designed to run seamlessly in both the KNIME local desktop version and KNIME Server WebPortal as a web application. It incorporates interactive interfaces and guides to assist users in a step-by-step manner. We demonstrate the utility of this workflow through a case study using an eye irritation and corrosion dataset.Scientific contributionsThis work presents a novel, comprehensive chemical grouping workflow in KNIME, enhancing accessibility by integrating a user-friendly graphical interface that eliminates the need for extensive programming skills. This workflow uniquely combines several features such as automated molecular descriptor calculation, feature selection, dimensionality reduction, and machine learning algorithms (both supervised and unsupervised), with hyperparameter optimization to refine chemical grouping accuracy. Moreover, we have introduced an innovative interpretative step and natural language summaries to elucidate the underlying reasons for chemical groupings, significantly advancing the usability of the tool and interpretability of the results.

Collapse

Contreras-Torres E, Marrero-Ponce Y, Terán JE, Agüero-Chapin G, Antunes A, García-Jacas CR. Fuzzy spherical truncation-based multi-linear protein descriptors: From their definition to application in structural-related predictions. Front Chem 2022;10:959143. [PMID: 36277354 PMCID: PMC9585278 DOI: 10.3389/fchem.2022.959143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 08/15/2022] [Indexed: 11/13/2022] Open

Abstract This study introduces a set of fuzzy spherically truncated three-dimensional (3D) multi-linear descriptors for proteins. These indices codify geometric structural information from kth spherically truncated spatial-(dis)similarity two-tuple and three-tuple tensors. The coefficients of these truncated tensors are calculated by applying a smoothing value to the 3D structural encoding based on the relationships between two and three amino acids of a protein embedded into a sphere. At considering, the geometrical center of the protein matches with center of the sphere, the distance between each amino acid involved in any specific interaction and the geometrical center of the protein can be computed. Then, the fuzzy membership degree of each amino acid from an spherical region of interest is computed by fuzzy membership functions (FMFs). The truncation value is finally a combination of the membership degrees from interacting amino acids, by applying the arithmetic mean as fusion rule. Several fuzzy membership functions with diverse biases on the calculation of amino acids memberships (e.g., Z-shaped (close to the center), PI-shaped (middle region), and A-Gaussian (far from the center)) were considered as well as traditional truncation functions (e.g., Switching). Such truncation functions were comparatively evaluated by exploring: 1) the frequency of membership degrees, 2) the variability and orthogonality analyses among them based on the Shannon Entropy’s and Principal Component’s methods, respectively, and 3) the prediction performance of alignment-free prediction of protein folding rates and structural classes. These analyses unraveled the singularity of the proposed fuzzy spherically truncated MDs with respect to the classical (non-truncated) ones and respect to the MDs truncated with traditional functions. They also showed an improved prediction power by attaining an external correlation coefficient of 95.82% in the folding rate modelling and an accuracy of 100% in distinguishing structural protein classes. These outcomes are better than the ones attained by existing approaches, justifying the theoretical contribution of this report. Thus, the fuzzy spherically truncated-based protein descriptors from MuLiMs-MCoMPAs (http://tomocomd.com/mulims-mcompas) are promising alignment-free predictors for modeling protein functions and properties. Collapse

Affiliation(s)

Ernesto Contreras-Torres Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Universidad San Francisco de Quito (USFQ), Quito, Pichincha, Ecuador Instituto de Simulación Computacional (ISC-USFQ), Quito, Pichincha, Ecuador BCAM—Basque Center for Applied Mathematics, Bilbao, Spain
Yovani Marrero-Ponce Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Universidad San Francisco de Quito (USFQ), Quito, Pichincha, Ecuador Instituto de Simulación Computacional (ISC-USFQ), Quito, Pichincha, Ecuador Computer-Aided Molecular “Biosilico” Discovery and Bioinformatics Research International Network (CAMD-BIR IN), Quito, Ecuador *Correspondence: Yovani Marrero-Ponce, , , César R. García-Jacas, , ,
Julio E. Terán Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Universidad San Francisco de Quito (USFQ), Quito, Pichincha, Ecuador Instituto de Simulación Computacional (ISC-USFQ), Quito, Pichincha, Ecuador Department of Textile Engineering, Chemistry and Science, College of Textiles, North Carolina State University, Raleigh, NC, United States
Guillermin Agüero-Chapin CIIMAR—Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Porto, Portugal Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
Agostinho Antunes CIIMAR—Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Porto, Portugal Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
César R. García-Jacas Cátedras Conacyt—Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California, Mexico *Correspondence: Yovani Marrero-Ponce, , , César R. García-Jacas, , ,

Collapse

Prada Gori DN, Llanos MA, Bellera CL, Talevi A, Alberca LN. iRaPCA and SOMoC: Development and Validation of Web Applications for New Approaches for the Clustering of Small Molecules. J Chem Inf Model 2022;62:2987-2998. [PMID: 35687523 DOI: 10.1021/acs.jcim.2c00265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Grisoni F, Schneider G. Molecular Scaffold Hopping via Holistic Molecular Representation. Methods Mol Biol 2021;2266:11-35. [PMID: 33759119 DOI: 10.1007/978-1-0716-1209-5_2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

A unified view of density-based methods for semi-supervised clustering and classification. Data Min Knowl Discov 2020;33:1894-1952. [PMID: 32831623 PMCID: PMC7410108 DOI: 10.1007/s10618-019-00651-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2018] [Accepted: 08/08/2019] [Indexed: 11/23/2022]

Diéguez-Santana K, Rivera-Borroto OM, Puris A, Pham-The H, Le-Thi-Thu H, Rasulev B, Casañola-Martin GM. Beyond model interpretability using LDA and decision trees for α-amylase and α-glucosidase inhibitor classification studies. Chem Biol Drug Des 2019;94:1414-1421. [PMID: 30908888 DOI: 10.1111/cbdd.13518] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2018] [Revised: 02/17/2019] [Accepted: 03/03/2019] [Indexed: 12/17/2022]

Kaneko H. Sparse Generative Topographic Mapping for Both Data Visualization and Clustering. J Chem Inf Model 2018;58:2528-2535. [DOI: 10.1021/acs.jcim.8b00528] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Prathipati P, Mizuguchi K. Integration of Ligand and Structure Based Approaches for CSAR-2014. J Chem Inf Model 2015;56:974-87. [PMID: 26492437 DOI: 10.1021/acs.jcim.5b00477] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]

Abstract

The prediction of binding poses and affinities is an area of active interest in computer-aided drug design (CADD). Given the documented limitations with either ligand or structure based approaches, we employed an integrated approach and developed a rapid protocol for binding mode and affinity predictions. This workflow was applied to the three protein targets of Community Structure-Activity Resource-2014 (CSAR-2014) exercise: Factor Xa (FXa), Spleen Tyrosine Kinase (SYK), and tRNA (guanine-N(1))-methyltransferase (TrmD). Our docking and scoring workflow incorporates compound clustering and ligand and protein structure based pharmacophore modeling, followed by local docking, minimization, and scoring. While the former part of the protocol ensures high-quality ligand alignments and mapping, the subsequent minimization and scoring provides the predicted binding modes and affinities. We made blind predictions of docking pose for 1, 5, and 14 ligands docked into 1, 2, and 12 crystal structures of FXa, SYK, and TrmD, respectively. The resulting 174 poses were compared with cocrystallized structures (1, 5, and 14 complexes) made available at the end of CSAR. Our predicted poses were related to the experimentally determined structures with a mean root-mean-square deviation value of 3.4 Å. Further, we were able to classify high and low affinity ligands with the area under the curve values of 0.47, 0.60, and 0.69 for FXa, SYK, and TrmD, respectively, indicating the validity of our approach in at least two of the three systems. Detailed critical analysis of the results and CSAR methodology ranking procedures suggested that a straightforward application of our workflow has limitations, as some of the performance measures do not reflect the actual utility of pose and affinity predictions in the biological context of individual systems.

Collapse

Saeed F, Salim N, Abdo A. Consensus methods for combining multiple clusterings of chemical structures. J Chem Inf Model 2013;53:1026-34. [PMID: 23581471 DOI: 10.1021/ci300442u] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

MacCuish JD, MacCuish NE. Chemoinformatics applications of cluster analysis. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2013. [DOI: 10.1002/wcms.1152] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]

Palacios-Bejarano B, Cerruela García G, Luque Ruiz I, Gómez-Nieto MÁ. QSAR model based on weighted MCS trees approach for the representation of molecule data sets. J Comput Aided Mol Des 2013;27:185-201. [DOI: 10.1007/s10822-013-9637-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2012] [Accepted: 02/01/2013] [Indexed: 11/28/2022]

Saeed F, Salim N, Abdo A, Hentabli H. Graph-Based Consensus Clustering for Combining Multiple Clusterings of Chemical Structures. Mol Inform 2013;32:165-78. [PMID: 27481278 DOI: 10.1002/minf.201200110] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2012] [Accepted: 12/09/2012] [Indexed: 11/10/2022]

Saeed F, Salim N, Abdo A. Voting-based consensus clustering for combining multiple clusterings of chemical structures. J Cheminform 2012;4:37. [PMID: 23244782 PMCID: PMC3541359 DOI: 10.1186/1758-2946-4-37] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2012] [Accepted: 12/11/2012] [Indexed: 11/26/2022] Open

Abstract

UNLABELLED

BACKGROUND

Although many consensus clustering methods have been successfully used for combining multiple classifiers in many areas such as machine learning, applied statistics, pattern recognition and bioinformatics, few consensus clustering methods have been applied for combining multiple clusterings of chemical structures. It is known that any individual clustering method will not always give the best results for all types of applications. So, in this paper, three voting and graph-based consensus clusterings were used for combining multiple clusterings of chemical structures to enhance the ability of separating biologically active molecules from inactive ones in each cluster.

RESULTS

The cumulative voting-based aggregation algorithm (CVAA), cluster-based similarity partitioning algorithm (CSPA) and hyper-graph partitioning algorithm (HGPA) were examined. The F-measure and Quality Partition Index method (QPI) were used to evaluate the clusterings and the results were compared to the Ward's clustering method. The MDL Drug Data Report (MDDR) dataset was used for experiments and was represented by two 2D fingerprints, ALOGP and ECFP_4. The performance of voting-based consensus clustering method outperformed the Ward's method using F-measure and QPI method for both ALOGP and ECFP_4 fingerprints, while the graph-based consensus clustering methods outperformed the Ward's method only for ALOGP using QPI. The Jaccard and Euclidean distance measures were the methods of choice to generate the ensembles, which give the highest values for both criteria.

CONCLUSIONS

The results of the experiments show that consensus clustering methods can improve the effectiveness of chemical structures clusterings. The cumulative voting-based aggregation algorithm (CVAA) was the method of choice among consensus clustering methods.

Collapse

Hechinger M, Leonhard K, Marquardt W. What is Wrong with Quantitative Structure–Property Relations Models Based on Three-Dimensional Descriptors? J Chem Inf Model 2012;52:1984-93. [DOI: 10.1021/ci300246m] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Rivera-Borroto OM, Rabassa-Gutiérrez M, Grau-Ábalo RDC, Marrero-Ponce Y, García-de la Vega JM. Dunn's index for cluster tendency assessment of pharmacological data sets. Can J Physiol Pharmacol 2012;90:425-33. [PMID: 22443093 DOI: 10.1139/y2012-002] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]