1
|
Dou B, Zhu Z, Merkurjev E, Ke L, Chen L, Jiang J, Zhu Y, Liu J, Zhang B, Wei GW. Machine Learning Methods for Small Data Challenges in Molecular Science. Chem Rev 2023; 123:8736-8780. [PMID: 37384816 PMCID: PMC10999174 DOI: 10.1021/acs.chemrev.3c00189] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, and technical limitations in data acquisition. However, big data have been the focus for the past decade, small data and their challenges have received little attention, even though they are technically more severe in machine learning (ML) and deep learning (DL) studies. Overall, the small data challenge is often compounded by issues, such as data diversity, imputation, noise, imbalance, and high-dimensionality. Fortunately, the current big data era is characterized by technological breakthroughs in ML, DL, and artificial intelligence (AI), which enable data-driven scientific discovery, and many advanced ML and DL technologies developed for big data have inadvertently provided solutions for small data problems. As a result, significant progress has been made in ML and DL for small data challenges in the past decade. In this review, we summarize and analyze several emerging potential solutions to small data challenges in molecular science, including chemical and biological sciences. We review both basic machine learning algorithms, such as linear regression, logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), kernel learning (KL), random forest (RF), and gradient boosting trees (GBT), and more advanced techniques, including artificial neural network (ANN), convolutional neural network (CNN), U-Net, graph neural network (GNN), Generative Adversarial Network (GAN), long short-term memory (LSTM), autoencoder, transformer, transfer learning, active learning, graph-based semi-supervised learning, combining deep learning with traditional machine learning, and physical model-based data augmentation. We also briefly discuss the latest advances in these methods. Finally, we conclude the survey with a discussion of promising trends in small data challenges in molecular science.
Collapse
Affiliation(s)
- Bozheng Dou
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Zailiang Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Ekaterina Merkurjev
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Lu Ke
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Long Chen
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jian Jiang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yueying Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jie Liu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
2
|
Optimum Feature Selection with Particle Swarm Optimization to Face Recognition System Using Gabor Wavelet Transform and Deep Learning. BIOMED RESEARCH INTERNATIONAL 2021; 2021:6621540. [PMID: 33778071 PMCID: PMC7969091 DOI: 10.1155/2021/6621540] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/01/2021] [Revised: 02/06/2021] [Accepted: 02/24/2021] [Indexed: 01/22/2023]
Abstract
In this study, Gabor wavelet transform on the strength of deep learning which is a new approach for the symmetry face database is presented. A proposed face recognition system was developed to be used for different purposes. We used Gabor wavelet transform for feature extraction of symmetry face training data, and then, we used the deep learning method for recognition. We implemented and evaluated the proposed method on ORL and YALE databases with MATLAB 2020a. Moreover, the same experiments were conducted applying particle swarm optimization (PSO) for the feature selection approach. The implementation of Gabor wavelet feature extraction with a high number of training image samples has proved to be more effective than other methods in our study. The recognition rate when implementing the PSO methods on the ORL database is 85.42% while it is 92% with the three methods on the YALE database. However, the use of the PSO algorithm has increased the accuracy rate to 96.22% for the ORL database and 94.66% for the YALE database.
Collapse
|
5
|
Shaimaa A. El-said. Reliable Face Recognition Using Artificial Neural Network. INTERNATIONAL JOURNAL OF SYSTEM DYNAMICS APPLICATIONS 2013. [DOI: 10.4018/ijsda.2013040102] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Facial detection and recognition are among the most heavily researched fields of computer vision and image processing. Most of the current face recognition techniques suffer when the noises affect the global features or the local intensity pixels of the images under consideration. In the proposed Reliable Face Recognition System (RFRS) system, for the first time, a combination of Gabor Filter (GF), Principal component analysis (PCA) for efficient feature extraction, and ANN for classification is employed. This demonstrates how to detect faces in noisy images by training the network several times on various input; ideal and noisy images of faces. Applying GF before PCA reduces PCA sensitivity to noise, provides a greater level of invariance, and trains the ANN on different sets of noisy images. The output of the ANN is a vector whose length equal to the distinct subjects in Olivetti Research Laboratory (ORL). The ANN is trained to output a 1 in the correct position of the output vector and to fill the rest of the output vector with 0’s. Experimentation is carried out on RFRS by using ORL datasets. The experimental results show that training the network on noisy input images of face greatly reduce its errors when it had to classify or recognize noisy images. For noisy face images, the network did not make any errors for faces with noise of mean 0.00 or 0.05, while the average recognition rate varies from 96.8% to 98%. When noise of mean 0.10 is added to the images the network begins to make errors. For noiseless face images, the proposed system achieves correct classification. Performance comparison between RFRS and other face recognition techniques shows that for most of the cases, RFRS performs better than other conventional techniques under different types of noises and it shows the high robustness of the proposed algorithm.
Collapse
Affiliation(s)
- Shaimaa A. El-said
- Department of Electronics and Communications, Zagazig University, Zagazig, Egypt
| |
Collapse
|
7
|
Foresti GL. Invariant feature extraction and neural trees for range surface classification. ACTA ACUST UNITED AC 2008; 32:356-66. [PMID: 18238133 DOI: 10.1109/tsmcb.2002.999811] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
In this paper, a neural tree-based approach for classifying range images into a set of nonoverlapping regions is presented. An innovative procedure is applied to extract invariant surface features from each pixel of the range image. These features are: 1) robust to noise, and 2) invariant to scale, shift, rotations, curvature variations, and direction of the normal. Then, a generalized neural tree is used to classify each image point as belonging to one of the six surface models of differential geometry, i.e., peak, ridge, valley, saddle, pit, and flat. Comparisons with other methods and experiments on both synthetic and real three-dimensional range images are proposed.
Collapse
Affiliation(s)
- G L Foresti
- Dept. of Math. & Comput. Sci. (DIMI), Udine Univ
| |
Collapse
|
9
|
Abstract
A new neural tree model, called adaptive high-order neural tree (AHNT), is proposed for classifying large sets of multidimensional patterns. The AHNT is built by recursively dividing the training set into subsets and by assigning each subset to a different child node. Each node is composed of a high-order perceptron (HOP) whose order is automatically tuned taking into account the complexity of the pattern set reaching that node. First-order nodes divide the input space with hyperplanes, while HOPs divide the input space arbitrarily, but at the expense of increased complexity. Experimental results demonstrate that the AHNT generalizes better than trees with homogeneous nodes, produces small trees and avoids the use of complex comparative statistical tests and/or a priori selection of large parameter sets.
Collapse
Affiliation(s)
- G L Foresti
- Department of Mathematics and Computer Science (DIMI), University of Udine, Via delle Scienze, 208-33100 Udine, Italy.
| | | |
Collapse
|
11
|
Zhang M, Zhang JC, Fulcher J. Higherorder neural network group models for financial simulation. Int J Neural Syst 2000; 10:123-42. [PMID: 10939345 DOI: 10.1142/s0129065700000119] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Real world financial data is often discontinuous and non-smooth. If we attempt to use neural networks to simulate such functions, then accuracy will be a problem. Neural network group models perform this function much better. Both Polynomial Higher Order Neural network Group (PHONG) and Trigonometric polynomial Higher Order Neural network Group (THONG) models are developed. These HONG models are open box, convergent models capable of approximating any kind of piecewise continuous function, to any degree of accuracy. Moreover they are capable of handling higher frequency, higher order non-linear and discontinuous data. Results obtained using a Higher Order Neural network Group financial simulator are presented, which confirm that HONG group models converge without difficulty, and are considerably more accurate than neural network models (more specifically, around twice as good for prediction, and a factor of four improvement in the case of simulation).
Collapse
Affiliation(s)
- M Zhang
- Department of Computing & Information Systems, University of Western Sydney, Macarthur, Campbelltown, NSW, Australia.
| | | | | |
Collapse
|