1
|
Govil S, Tripathi S, Kumar A, Shrivastava D, Kumar S. Comparative Study for Prediction of Low and High Plasma Protein Binding Drugs by Various Machine Learning-Based Classification Algorithms. ASIAN JOURNAL OF PHARMACEUTICAL RESEARCH AND HEALTH CARE 2021. [DOI: 10.18311/ajprhc/2021/28497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
<p>In the drug discovery path, most drug candidates failed at the early stages due to their pharmacokinetic behavior in the system. Early prediction of pharmacokinetic properties and screening methods can reduce the time and investment for lead discoveries. Plasma protein binding is one of these properties which has a vital role in drug discovery and development. The focus of the current study is to develop a computational model for the classification of Low Plasma Protein Binding (LPPB) and High Plasma Protein Binding (HPPB) drugs using machine learning methods for early screening of molecules through WEKA. Plasma protein binding drugs data was collated from the Drug Bank database where 617 drug candidates were found to interact with plasma proteins, out of which an equal proportion of high and low plasma protein binding drugs were extracted to build a training set of ~300 drugs. The machine learning algorithms were trained with a training set and evaluated by a test set. We also compared various machine learning-based classification algorithms i.e., the Naïve Bayes algorithm, Instance-Based Learner (IBK), multilayer perceptron, and random forest to determine the best model based on accuracy. It was observed that the random forest algorithm-based model outperforms with an accuracy of 99.67% and 0.9933 kappa value on training set and on test set as compared to other classification methods and can predict drug plasma binding capacity in the given data set using the WEKA tool.</p>
Collapse
|
2
|
Abstract
AbstractRadial basis function networks (RBFNs) have
gained widespread appeal amongst researchers and have
shown good performance in a variety of application domains.
They have potential for hybridization and demonstrate
some interesting emergent behaviors. This paper
aims to offer a compendious and sensible survey on RBF
networks. The advantages they offer, such as fast training
and global approximation capability with local responses,
are attracting many researchers to use them in diversified
fields. The overall algorithmic development of RBF networks
by giving special focus on their learning methods,
novel kernels, and fine tuning of kernel parameters have
been discussed. In addition, we have considered the recent
research work on optimization of multi-criterions in
RBF networks and a range of indicative application areas
along with some open source RBFN tools.
Collapse
|
3
|
Affiliation(s)
- Giovanna Menardi
- Dipartimento di Scienze Statistiche; Università di Padova; via C. Battisti 241, 35121 Padova Italy
| |
Collapse
|
4
|
Script Identification from Printed Indian Document Images and Performance Evaluation Using Different Classifiers. APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING 2014. [DOI: 10.1155/2014/896128] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Identification of script from document images is an active area of research under document image processing for a multilingual/ multiscript country like India. In this paper the real life problem of printed script identification from official Indian document images is considered and performances of different well-known classifiers are evaluated. Two important evaluating parameters, namely, AAR (average accuracy rate) and MBT (model building time), are computed for this performance analysis. Experiment was carried out on 459 printed document images with 5-fold cross-validation. Simple Logistic model shows highest AAR of 98.9% among all. BayesNet and Random Forest model have average accuracy rate of 96.7% and 98.2% correspondingly with lowest MBT of 0.09 s.
Collapse
|
5
|
The blind men and the elephant: on meeting the problem of multiple truths in data from clustering and pattern mining perspectives. Mach Learn 2013. [DOI: 10.1007/s10994-013-5334-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
6
|
Volkovich Z, Barzily Z, Avros R, Toledano-Kitai D. On Application of a Probabilistic K-Nearest Neighbors Model for Cluster Validation Problem. COMMUN STAT-THEOR M 2011. [DOI: 10.1080/03610926.2011.562786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
7
|
Volkovich Z, Barzily Z, Weber GW, Toledano-Kitai D, Avros R. Resampling approach for cluster model selection. Mach Learn 2011. [DOI: 10.1007/s10994-011-5236-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
8
|
Gupta G, Liu A, Ghosh J. Automated hierarchical density shaving: a robust automated clustering and visualization framework for large biological data sets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2010; 7:223-237. [PMID: 20431143 DOI: 10.1109/tcbb.2008.32] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
A key application of clustering data obtained from sources such as microarrays, protein mass spectroscopy, and phylogenetic profiles is the detection of functionally related genes. Typically, only a small number of functionally related genes cluster into one or more groups, and the rest need to be ignored. For such situations, we present Automated Hierarchical Density Shaving (Auto-HDS), a framework that consists of a fast hierarchical density-based clustering algorithm and an unsupervised model selection strategy. Auto-HDS can automatically select clusters of different densities, present them in a compact hierarchy, and rank individual clusters using an innovative stability criteria. Our framework also provides a simple yet powerful 2D visualization of the hierarchy of clusters that is useful for further interactive exploration. We present results on Gasch and Lee microarray data sets to show the effectiveness of our methods. Additional results on other biological data are included in the supplemental material.
Collapse
|
9
|
|
10
|
Suresh S, Sundararajan N, Saratchandran P. A sequential multi-category classifier using radial basis function networks. Neurocomputing 2008. [DOI: 10.1016/j.neucom.2007.06.003] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
11
|
Mu L, Wang F. A Scale-Space Clustering Method: Mitigating the Effect of Scale in the Analysis of Zone-Based Data. ACTA ACUST UNITED AC 2008. [DOI: 10.1080/00045600701734224] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
12
|
Carreira-Perpiñán MA. Gaussian mean-shift is an EM algorithm. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2007; 29:767-76. [PMID: 17356198 DOI: 10.1109/tpami.2007.1057] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
The mean-shift algorithm, based on ideas proposed by Fukunaga and Hostetler [16], is a hill-climbing algorithm on the density defined by a finite mixture or a kernel density estimate. Mean-shift can be used as a nonparametric clustering method and has attracted recent attention in computer vision applications such as image segmentation or tracking. We show that, when the kernel is Gaussian, mean-shift is an expectation-maximization (EM) algorithm and, when the kernel is non-Gaussian, mean-shift is a generalized EM algorithm. This implies that mean-shift converges from almost any starting point and that, in general, its convergence is of linear order. For Gaussian mean-shift, we show: 1) the rate of linear convergence approaches 0 (superlinear convergence) for very narrow or very wide kernels, but is often close to 1 (thus, extremely slow) for intermediate widths and exactly 1 (sublinear convergence) for widths at which modes merge, 2) the iterates approach the mode along the local principal component of the data points from the inside of the convex hull of the data points, and 3) the convergence domains are nonconvex and can be disconnected and show fractal behavior. We suggest ways of accelerating mean-shift based on the EM interpretation.
Collapse
Affiliation(s)
- Miguel A Carreira-Perpiñán
- Department of Computer Science and Electrical Engineering, OGI School of Science and Engineering, Oregon Health and Science University, Beaverton, OR 97006, USA.
| |
Collapse
|
13
|
Braga-Neto U, Goutsias J. Object-based image analysis using multiscale connectivity. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2005; 27:892-907. [PMID: 15943421 DOI: 10.1109/tpami.2005.124] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
This paper introduces a novel approach for image analysis based on the notion of multiscale connectivity. We use the proposed approach to design several novel tools for object-based image representation and analysis which exploit the connectivity structure of images in a multiscale fashion. More specifically, we propose a nonlinear pyramidal image representation scheme, which decomposes an image at different scales by means of multiscale grain filters. These filters gradually remove connected components from an image that fail to satisfy a given criterion. We also use the concept of multiscale connectivity to design a hierarchical data partitioning tool. We employ this tool to construct another image representation scheme, based on the concept of component trees, which organizes partitions of an image in a hierarchical multiscale fashion. In addition, we propose a geometrically-oriented hierarchical clustering algorithm which generalizes the classical single-linkage algorithm. Finally, we propose two object-based multiscale image summaries, reminiscent of the well-known (morphological) pattern spectrum, which can be useful in image analysis and image understanding applications.
Collapse
Affiliation(s)
- Ulisses Braga-Neto
- Virology and Experimental Therapy Laboratory of the Aggeu Magalhães Research Center--CPqAM/FIOCRUZ, Recife, PE Brazil.
| | | |
Collapse
|
14
|
Somanathan H, Borges RM, Chakravarthy VS. Does Neighborhood Floral Display Matter? Fruit Set in Carpenter Bee-pollinated Heterophragma quadriloculare and Beetle-pollinated Lasiosiphon eriocephalus. Biotropica 2004. [DOI: 10.1111/j.1744-7429.2004.tb00306.x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
15
|
Somanathan H, Borges RM, Srinivasa Chakravarthy V. Does Neighborhood Floral Display Matter? Fruit Set in Carpenter Bee-pollinated Heterophragma quadriloculare and Beetle-pollinated Lasiosiphon eriocephalus1. Biotropica 2004. [DOI: 10.1646/q1572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
16
|
Determination of the spread parameter in the Gaussian kernel for classification and regression. Neurocomputing 2003. [DOI: 10.1016/s0925-2312(02)00632-x] [Citation(s) in RCA: 128] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
17
|
Tambouratzis G, Tambouratzis T, Tambouratzis D. Clustering with artificial neural networks and traditional techniques. INT J INTELL SYST 2003. [DOI: 10.1002/int.10095] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
18
|
|
19
|
|
20
|
Abstract
BACKGROUND Analytical flow cytometry (AFC), by quantifying sometimes more than 10 optical parameters on cells at rates of approximately 10(3) cells/s, rapidly generates vast quantities of multidimensional data, which provides a considerable challenge for data analysis. We review the application of multivariate data analysis and pattern recognition techniques to flow cytometry. METHODS Approaches were divided into two broad types depending on whether the aim was identification or clustering. Multivariate statistical approaches, supervised artificial neural networks (ANNs), problems of overlapping character distributions, unbounded data sets, missing parameters, scaling up, and estimating proportions of different types of cells comprised the first category. Classic clustering methods, fuzzy clustering, and unsupervised ANNs comprised the second category. We demonstrate the state of the art by using AFC data on marine phytoplankton populations. RESULTS AND CONCLUSIONS Information held within the large quantities of data generated by AFC was tractable using ANNs, but for field studies the problem of obtaining suitable training data needs to be resolved, and coping with an almost infinite number of cell categories needs further research.
Collapse
Affiliation(s)
- L Boddy
- Cardiff School of Biosciences, Cardiff University, Cardiff, United Kingdom.
| | | | | |
Collapse
|
21
|
Wilkins MF, Hardy SA, Boddy L, Morris CW. Comparison of five clustering algorithms to classify phytoplankton from flow cytometry data. CYTOMETRY 2001; 44:210-7. [PMID: 11429771 DOI: 10.1002/1097-0320(20010701)44:3<210::aid-cyto1113>3.0.co;2-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
BACKGROUND Artificial neural networks (ANNs) have been shown to be valuable in the analysis of analytical flow cytometric (AFC) data in aquatic ecology. Automated extraction of clusters is an important first stage in deriving ANN training data from field samples, but AFC data pose a number of challenges for many types of clustering algorithm. The fuzzy k-means algorithm recently has been extended to address nonspherical clusters with the use of scatter matrices. Four variants were proposed, each optimizing a different measure of clustering "goodness." METHODS With AFC data obtained from marine phytoplankton species in culture, the four fuzzy k-means algorithm variants were compared with each other and with another multivariate clustering algorithm based on critical distances currently used in flow cytometry. RESULTS One of the algorithm variants (adaptive distances, also known as the Gustafson--Kessel algorithm) was found to be robust and reliable, whereas the others showed various problems. CONCLUSIONS The adaptive distances algorithm was superior in use to the clustering algorithms against which it was tested, but the problem of automatic determination of the number of clusters remains to be addressed.
Collapse
Affiliation(s)
- M F Wilkins
- Cardiff School of Biosciences, Cardiff, United Kingdom
| | | | | | | |
Collapse
|
22
|
Wan C, Harrington PDB. Self-Configuring Radial Basis Function Neural Networks for Chemical Pattern Recognition. ACTA ACUST UNITED AC 1999. [DOI: 10.1021/ci990306t] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Chuanhao Wan
- Clippinger Laboratories, Ohio University Center for Intelligent Chemical Instrumentation, Ohio University, Athens, Ohio 45701-2979
| | - Peter de B. Harrington
- Clippinger Laboratories, Ohio University Center for Intelligent Chemical Instrumentation, Ohio University, Athens, Ohio 45701-2979
| |
Collapse
|
23
|
|
24
|
Hong X, Billings SA. Dual-orthogonal radial basis function networks for nonlinear time series prediction. Neural Netw 1998; 11:479-493. [PMID: 12662824 DOI: 10.1016/s0893-6080(97)00132-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
A new structure of Radial Basis Function (RBF) neural network called the Dual-orthogonal RBF Network (DRBF) is introduced for nonlinear time series prediction. The hidden nodes of a conventional RBF network compare the Euclidean distance between the network input vector and the centres, and the node responses are radially symmetrical. But in time series prediction where the system input vectors are lagged system outputs, which are usually highly correlated, the Euclidean distance measure may not be appropriate. The DRBF network modifies the distance metric by introducing a classification function which is based on the estimation data set. Training the DRBF networks consists of two stages. Learning the classification related basis functions and the important input nodes, followed by selecting the regressors and learning the weights of the hidden nodes. In both cases, a forward Orthogonal Least Squares (OLS) selection procedure is applied, initially to select the important input nodes and then to select the important centres. Simulation results of single-step and multi-step ahead predictions over a test data set are included to demonstrate the effectiveness of the new approach.
Collapse
Affiliation(s)
- X Hong
- Department of Automatic Control and Systems Engineering, University of Sheffield, Sheffield, UK
| | | |
Collapse
|
25
|
|