1
|
Dou B, Zhu Z, Merkurjev E, Ke L, Chen L, Jiang J, Zhu Y, Liu J, Zhang B, Wei GW. Machine Learning Methods for Small Data Challenges in Molecular Science. Chem Rev 2023; 123:8736-8780. [PMID: 37384816 PMCID: PMC10999174 DOI: 10.1021/acs.chemrev.3c00189] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, and technical limitations in data acquisition. However, big data have been the focus for the past decade, small data and their challenges have received little attention, even though they are technically more severe in machine learning (ML) and deep learning (DL) studies. Overall, the small data challenge is often compounded by issues, such as data diversity, imputation, noise, imbalance, and high-dimensionality. Fortunately, the current big data era is characterized by technological breakthroughs in ML, DL, and artificial intelligence (AI), which enable data-driven scientific discovery, and many advanced ML and DL technologies developed for big data have inadvertently provided solutions for small data problems. As a result, significant progress has been made in ML and DL for small data challenges in the past decade. In this review, we summarize and analyze several emerging potential solutions to small data challenges in molecular science, including chemical and biological sciences. We review both basic machine learning algorithms, such as linear regression, logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), kernel learning (KL), random forest (RF), and gradient boosting trees (GBT), and more advanced techniques, including artificial neural network (ANN), convolutional neural network (CNN), U-Net, graph neural network (GNN), Generative Adversarial Network (GAN), long short-term memory (LSTM), autoencoder, transformer, transfer learning, active learning, graph-based semi-supervised learning, combining deep learning with traditional machine learning, and physical model-based data augmentation. We also briefly discuss the latest advances in these methods. Finally, we conclude the survey with a discussion of promising trends in small data challenges in molecular science.
Collapse
Affiliation(s)
- Bozheng Dou
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Zailiang Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Ekaterina Merkurjev
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Lu Ke
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Long Chen
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jian Jiang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yueying Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jie Liu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
2
|
de Sire A, Gallelli L, Marotta N, Lippi L, Fusco N, Calafiore D, Cione E, Muraca L, Maconi A, De Sarro G, Ammendolia A, Invernizzi M. Vitamin D Deficiency in Women with Breast Cancer: A Correlation with Osteoporosis? A Machine Learning Approach with Multiple Factor Analysis. Nutrients 2022; 14:1586. [PMID: 35458148 PMCID: PMC9031622 DOI: 10.3390/nu14081586] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 04/08/2022] [Accepted: 04/09/2022] [Indexed: 12/16/2022] Open
Abstract
Breast cancer (BC) is the most frequent malignant tumor in women in Europe and North America, and the use of aromatase inhibitors (AIs) is recommended in women affected by estrogen receptor-positive BCs. AIs, by inhibiting the enzyme that converts androgens into estrogen, cause a decrement in bone mineral density (BMD), with a consequent increased risk of fragility fractures. This study aimed to evaluate the role of vitamin D3 deficiency in women with breast cancer and its correlation with osteoporosis and BMD modifications. This observational cross-sectional study collected the following data regarding bone health: osteoporosis and osteopenia diagnosis, lumbar spine (LS) and femoral neck bone mineral density (BMD), serum levels of 25-hydroxyvitamin D3 (25(OH)D3), calcium and parathyroid hormone. The study included 54 women with BC, mean age 67.3 ± 8.16 years. Given a significantly low correlation with the LS BMD value (r2 = 0.30, p = 0.025), we assessed the role of vitamin D3 via multiple factor analysis and found that BMD and vitamin D3 contributed to the arrangement of clusters, reported as vectors, providing similar trajectories of influence to the construction of the machine learning model. Thus, in a cohort of women with BC undergoing Ais, we identified a very low prevalence (5.6%) of patients with adequate bone health and a normal vitamin D3 status. According to our cluster model, we may conclude that the assessment and management of bone health and vitamin D3 status are crucial in BC survivors.
Collapse
Affiliation(s)
- Alessandro de Sire
- Physical Medicine and Rehabilitation Unit, Department of Medical and Surgical Sciences, University of Catanzaro “Magna Graecia”, 88100 Catanzaro, Italy; (N.M.); (A.A.)
| | - Luca Gallelli
- Operative Unit of Clinical Pharmacology, Mater Domini University Hospital, Department of Health Science, University of Catanzaro “Magna Graecia”, 88100 Catanzaro, Italy; (L.G.); (G.D.S.)
- Research Center FAS@UMG, Department of Health Science, University of Catanzaro “Magna Graecia”, 88100 Catanzaro, Italy
| | - Nicola Marotta
- Physical Medicine and Rehabilitation Unit, Department of Medical and Surgical Sciences, University of Catanzaro “Magna Graecia”, 88100 Catanzaro, Italy; (N.M.); (A.A.)
| | - Lorenzo Lippi
- Translational Medicine, Dipartimento Attività Integrate Ricerca e Innovazione (DAIRI), Azienda Ospedaliera SS. Antonio e Biagio e Cesare Arrigo, 15121 Alessandria, Italy; (L.L.); (A.M.); (M.I.)
- Department of Health Sciences, University of Eastern Piedmont “A. Avogadro”, 28100 Novara, Italy
| | - Nicola Fusco
- Department of Oncology and Hemato-Oncology, University of Milan, 20126 Milan, Italy;
- Division of Pathology, IEO, European Institute of Oncology IRCCS, 20141 Milan, Italy
| | - Dario Calafiore
- Physical Medicine and Rehabilitation Unit, Department of Neurosciences, ASST Carlo Poma, 46100 Mantova, Italy;
| | - Erika Cione
- Department of Pharmacy, Health and Nutritional Sciences, Department of Excellence 2018–2022, University of Calabria, 87036 Rende, Italy;
| | - Lucia Muraca
- Department of General Medicine, ASP 7, 88100 Catanzaro, Italy;
| | - Antonio Maconi
- Translational Medicine, Dipartimento Attività Integrate Ricerca e Innovazione (DAIRI), Azienda Ospedaliera SS. Antonio e Biagio e Cesare Arrigo, 15121 Alessandria, Italy; (L.L.); (A.M.); (M.I.)
| | - Giovambattista De Sarro
- Operative Unit of Clinical Pharmacology, Mater Domini University Hospital, Department of Health Science, University of Catanzaro “Magna Graecia”, 88100 Catanzaro, Italy; (L.G.); (G.D.S.)
- Research Center FAS@UMG, Department of Health Science, University of Catanzaro “Magna Graecia”, 88100 Catanzaro, Italy
| | - Antonio Ammendolia
- Physical Medicine and Rehabilitation Unit, Department of Medical and Surgical Sciences, University of Catanzaro “Magna Graecia”, 88100 Catanzaro, Italy; (N.M.); (A.A.)
| | - Marco Invernizzi
- Translational Medicine, Dipartimento Attività Integrate Ricerca e Innovazione (DAIRI), Azienda Ospedaliera SS. Antonio e Biagio e Cesare Arrigo, 15121 Alessandria, Italy; (L.L.); (A.M.); (M.I.)
- Department of Health Sciences, University of Eastern Piedmont “A. Avogadro”, 28100 Novara, Italy
| |
Collapse
|