1
|
Dou B, Zhu Z, Merkurjev E, Ke L, Chen L, Jiang J, Zhu Y, Liu J, Zhang B, Wei GW. Machine Learning Methods for Small Data Challenges in Molecular Science. Chem Rev 2023; 123:8736-8780. [PMID: 37384816 PMCID: PMC10999174 DOI: 10.1021/acs.chemrev.3c00189] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, and technical limitations in data acquisition. However, big data have been the focus for the past decade, small data and their challenges have received little attention, even though they are technically more severe in machine learning (ML) and deep learning (DL) studies. Overall, the small data challenge is often compounded by issues, such as data diversity, imputation, noise, imbalance, and high-dimensionality. Fortunately, the current big data era is characterized by technological breakthroughs in ML, DL, and artificial intelligence (AI), which enable data-driven scientific discovery, and many advanced ML and DL technologies developed for big data have inadvertently provided solutions for small data problems. As a result, significant progress has been made in ML and DL for small data challenges in the past decade. In this review, we summarize and analyze several emerging potential solutions to small data challenges in molecular science, including chemical and biological sciences. We review both basic machine learning algorithms, such as linear regression, logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), kernel learning (KL), random forest (RF), and gradient boosting trees (GBT), and more advanced techniques, including artificial neural network (ANN), convolutional neural network (CNN), U-Net, graph neural network (GNN), Generative Adversarial Network (GAN), long short-term memory (LSTM), autoencoder, transformer, transfer learning, active learning, graph-based semi-supervised learning, combining deep learning with traditional machine learning, and physical model-based data augmentation. We also briefly discuss the latest advances in these methods. Finally, we conclude the survey with a discussion of promising trends in small data challenges in molecular science.
Collapse
Affiliation(s)
- Bozheng Dou
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Zailiang Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Ekaterina Merkurjev
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Lu Ke
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Long Chen
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jian Jiang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yueying Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jie Liu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
2
|
Goel A, Goel AK, Kumar A. The role of artificial neural network and machine learning in utilizing spatial information. SPATIAL INFORMATION RESEARCH 2023; 31:275-285. [PMCID: PMC9673209 DOI: 10.1007/s41324-022-00494-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 10/23/2022] [Accepted: 10/25/2022] [Indexed: 01/10/2024]
Abstract
In this age of the fourth industrial revolution 4.0, the digital world has a plethora of data, including the internet of things, mobile, cybersecurity, social media, forecasts, health data, and so on. The expertise of machine learning and artificial intelligence (AI) is required to soundly evaluate the data and develop related smart and automated applications, These fields use a variety of machine learning techniques including supervised, unsupervised, and reinforcement learning. The objective of the study is to present the role of artificial neural networks and machine learning in utilizing spatial information. Machine learning and AI play an increasingly important role in disaster risk reduction from hazard mapping and forecasting severe occurrences to real-time event detection, situational awareness, and decision assistance. Some of the applications employed in the study to analyze the various ANN domains included weather forecasting, medical diagnosis, aerospace, facial recognition, stock market, social media, signature verification, forensics, robotics, electronics hardware, defense, and seismic data gathering. Machine learning determines the many prediction models for problems involving classification, regression, and clustering using known variables and locations from the training dataset, spatial data that is based on tabular data creates different observations that are geographically related to one another for unknown factors and places. The study presents that the Recurrent neural network and convolutional neural network are the best method in spatial information processing, healthcare, and weather forecasting with greater than 90% accuracy.
Collapse
Affiliation(s)
- Akash Goel
- Department of Computer Science & Engineering, Galgotia’s University, Greater Noida, NCR India
| | - Amit Kumar Goel
- Department of Computer Science & Engineering, Galgotia’s University, Greater Noida, NCR India
| | - Adesh Kumar
- Department of Electrical & Electronics Engineering, School of Engineering, University of Petroleum and Energy Studies, Dehradun, India
| |
Collapse
|
3
|
Combined Prediction Method for Thermal Conductivity of Asphalt Concrete Based on Meso-Structure and Renormalization Technology. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12020857] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Pavement temperature field affects pavement service life and the thermal environment the near road surface; thus, is important for sustainable pavement design. This paper developed a combined prediction method for the thermal conductivity of asphalt concrete based on meso-structure and renormalization technology, which is critical for determining the pavement temperature field. The accuracy of the combined prediction method was verified by laboratory experiments. Using the tested and proven model, the effect of coarse aggregate type, shape, content, spatial orientation, air void of asphalt concrete, and steel fiber on the effective thermal conductivity was analyzed. The analysis results show that the orientation angle and aspect ratio of the aggregate have a combined effect on thermal conductivity. In general, when the aggregate orientation is parallel with the heat conduction direction, the effective thermal conductivity of asphalt concrete in that direction tends to be greater. The effective thermal conductivity of asphalt concrete decreases with the decrease of coarse aggregate content or steel fiber content or with the increase of porosity, and it increases with the increase of the effective thermal conductivity of coarse aggregate.
Collapse
|