Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Tomal JH, Welch WJ, Zamar RH. Exploiting Multiple Descriptor Sets in QSAR Studies. J Chem Inf Model 2016;56:501-9. [DOI: 10.1021/acs.jcim.5b00663] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Number

Cited by Other Article(s)

Tomal JH, Welch WJ, Zamar RH. Robust ranking by ensembling of diverse models and assessment metrics. J STAT COMPUT SIM 2022. [DOI: 10.1080/00949655.2022.2093873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]

Chowdhury RI, Tomal JH. Risk prediction for repeated measures health outcomes: A divide and recombine framework. INFORMATICS IN MEDICINE UNLOCKED 2022. [DOI: 10.1016/j.imu.2022.100847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open

Mao J, Akhtar J, Zhang X, Sun L, Guan S, Li X, Chen G, Liu J, Jeon HN, Kim MS, No KT, Wang G. Comprehensive strategies of machine-learning-based quantitative structure-activity relationship models. iScience 2021;24:103052. [PMID: 34553136 PMCID: PMC8441174 DOI: 10.1016/j.isci.2021.103052] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open

Affiliation(s)

Jiashun Mao The Interdisciplinary Graduate Program in Integrative Biotechnology and Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, Guangdong 518055, China Guangdong Provincial Key Laboratory of Computational Science and Material Design, Shenzhen, Guangdong 518055 China
Javed Akhtar Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, Guangdong 518055, China Guangdong Provincial Key Laboratory of Cell Microenvironment and Disease Research, Shenzhen, Guangdong 518055, China
Xiao Zhang Shanghai Rural Commercial Bank Co., Ltd, Shanghai 200002, China
Liang Sun Department of Physics, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong, China
Shenghui Guan Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, Guangdong 518055, China Guangdong Provincial Key Laboratory of Computational Science and Material Design, Shenzhen, Guangdong 518055 China
Xinyu Li School of Life and Health Sciences and Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China
Guangming Chen Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, Guangdong 518055, China Guangdong Provincial Key Laboratory of Cell Microenvironment and Disease Research, Shenzhen, Guangdong 518055, China
Jiaxin Liu Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Republic of Korea
Hyeon-Nae Jeon Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Republic of Korea
Min Sung Kim Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Republic of Korea
Kyoung Tai No The Interdisciplinary Graduate Program in Integrative Biotechnology and Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea
Guanyu Wang Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, Guangdong 518055, China Guangdong Provincial Key Laboratory of Computational Science and Material Design, Shenzhen, Guangdong 518055 China Guangdong Provincial Key Laboratory of Cell Microenvironment and Disease Research, Shenzhen, Guangdong 518055, China

Collapse

Hsu GG, Tomal JH, Welch WJ. EPX: An R package for the ensemble of subsets of variables for highly unbalanced binary classification. Comput Biol Med 2021;136:104760. [PMID: 34416572 DOI: 10.1016/j.compbiomed.2021.104760] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 08/06/2021] [Accepted: 08/07/2021] [Indexed: 11/26/2022]

Abstract

BACKGROUND AND OBJECTIVE

In binary classification problems with a rare class of interest, there is relatively little information available for the rare class to build a model. On the other hand, the number of useful variables to develop a model for classification can be high-dimensional. For example, in drug discovery, there are usually a very few bioactive compounds in a large chemical library, whereas thousands of potentially useful explanatory variables characterize a compound's chemical structure. The sparsity of information for the rare class of interest makes it difficult for the standard classification models to exploit the richness of the useful feature variables. Thus, the objective of this paper is to develop an R package which clusters the feature variables into diverse subsets to be aggregated into a powerful ensemble for the detection of a rare class object.

METHODS

The ensemble of phalanxes (EPX) builds a classifier by exploiting the richness of feature variables using several diverse subsets of variables, called phalanxes, and outperforms many competitive state-of-the-art classification methods in terms of predictive ranking of the rare class of interest.

RESULTS

We present an R package EPX which implements the algorithm to form the ensemble of phalanxes as well as its associated functions. We further show how the ensemble of phalanxes can be constructed using parallel computing to lower the computational burden given high-dimensional data.

CONCLUSIONS

The R package EPX shows a flexible way of clustering feature variable space into smaller and diverse subsets of variables to develop an ensemble of phalanxes which better ranks a rare class object in a highly unbalanced two class classification problem. The ensemble EPX will be useful to detect the rare drug-like active biomolecules for development in drug discovery (Tomal et al., Mar. 2016) [1] and homologous proteins using similarity scores of amino acid sequences in protein homology (Tomal et al., 2019) [2]. The package EPX is freely available to download from CRAN (https://CRAN.R-project.org/package=EPX).

Collapse

Zhang X, Niu W, Tang T, Hou C, Guo Y, Kong R. A Strategy to Find Novel Candidate DKAs Inhibitors Using Modified QSAR Model with Favorable Druggability Properties. Chem Res Chin Univ 2019. [DOI: 10.1007/s40242-019-9183-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Choi H, Kang H, Chung KC, Park H. Development and application of a comprehensive machine learning program for predicting molecular biochemical and pharmacological properties. Phys Chem Chem Phys 2019;21:5189-5199. [PMID: 30775759 DOI: 10.1039/c8cp07002d] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]

Subramanian G, Poda G. In silico ligand-based modeling of hBACE-1 inhibitors. Chem Biol Drug Des 2017;91:817-827. [PMID: 29139199 DOI: 10.1111/cbdd.13147] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2017] [Revised: 10/24/2017] [Accepted: 11/01/2017] [Indexed: 02/06/2023]

Tromelin A, Chabanet C, Audouze K, Koensgen F, Guichard E. Multivariate statistical analysis of a large odorants database aimed at revealing similarities and links between odorants and odors. FLAVOUR FRAG J 2017. [DOI: 10.1002/ffj.3430] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Qi M, Wang T, Yi Y, Gao N, Kong J, Wang J. Joint L_2,1 Norm and Fisher Discrimination Constrained Feature Selection for Rational Synthesis of Microporous Aluminophosphates. Mol Inform 2016;36. [PMID: 27863104 DOI: 10.1002/minf.201600076] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2016] [Accepted: 10/21/2016] [Indexed: 11/11/2022]

Baskin II, Winkler D, Tetko IV. A renaissance of neural networks in drug discovery. Expert Opin Drug Discov 2016;11:785-95. [PMID: 27295548 DOI: 10.1080/17460441.2016.1201262] [Citation(s) in RCA: 123] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]