1
|
Carracedo-Reboredo P, Liñares-Blanco J, Rodríguez-Fernández N, Cedrón F, Novoa FJ, Carballal A, Maojo V, Pazos A, Fernandez-Lozano C. A review on machine learning approaches and trends in drug discovery. Comput Struct Biotechnol J 2021; 19:4538-4558. [PMID: 34471498 PMCID: PMC8387781 DOI: 10.1016/j.csbj.2021.08.011] [Citation(s) in RCA: 163] [Impact Index Per Article: 40.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 08/06/2021] [Accepted: 08/06/2021] [Indexed: 12/30/2022] Open
Abstract
Drug discovery aims at finding new compounds with specific chemical properties for the treatment of diseases. In the last years, the approach used in this search presents an important component in computer science with the skyrocketing of machine learning techniques due to its democratization. With the objectives set by the Precision Medicine initiative and the new challenges generated, it is necessary to establish robust, standard and reproducible computational methodologies to achieve the objectives set. Currently, predictive models based on Machine Learning have gained great importance in the step prior to preclinical studies. This stage manages to drastically reduce costs and research times in the discovery of new drugs. This review article focuses on how these new methodologies are being used in recent years of research. Analyzing the state of the art in this field will give us an idea of where cheminformatics will be developed in the short term, the limitations it presents and the positive results it has achieved. This review will focus mainly on the methods used to model the molecular data, as well as the biological problems addressed and the Machine Learning algorithms used for drug discovery in recent years.
Collapse
Key Words
- ADMET, Absorption, distribution, metabolism, elimination and toxicity
- ADR, Adverse Drug Reaction
- AI, Artificial Intelligence
- ANN, Artificial Neural Networks
- APFP, Atom Pairs 2d FingerPrint
- AUC, Area under the Curve
- BBB, Blood–Brain barrier
- CDK, Chemical Development Kit
- CNN, Convolutional Neural Networks
- CNS, Central Nervous System
- CPI, Compound-protein interaction
- CV, Cross Validation
- Cheminformatics
- DL, Deep Learning
- DNA, Deoxyribonucleic acid
- Deep Learning
- Drug Discovery
- ECFP, Extended Connectivity Fingerprints
- FDA, Food and Drug Administration
- FNN, Fully Connected Neural Networks
- FP, Fringerprints
- FS, Feature Selection
- GCN, Graph Convolutional Networks
- GEO, Gene Expression Omnibus
- GNN, Graph Neural Networks
- GO, Gene Ontology
- KEGG, Kyoto Encyclopedia of Genes and Genomes
- MACCS, Molecular ACCess System
- MCC, Matthews correlation coefficient
- MD, Molecular Descriptors
- MKL, Multiple Kernel Learning
- ML, Machine Learning
- Machine Learning
- Molecular Descriptors
- NB, Naive Bayes
- OOB, Out of Bag
- PCA, Principal Component Analyisis
- QSAR
- QSAR, Quantitative structure–activity relationship
- RF, Random Forest
- RNA, Ribonucleic Acid
- SMILES, simplified molecular-input line-entry system
- SVM, Support Vector Machines
- TCGA, The Cancer Genome Atlas
- WHO, World Health Organization
- t-SNE, t-Distributed Stochastic Neighbor Embedding
Collapse
|
Review |
4 |
163 |
2
|
Adamidi ES, Mitsis K, Nikita KS. Artificial intelligence in clinical care amidst COVID-19 pandemic: A systematic review. Comput Struct Biotechnol J 2021; 19:2833-2850. [PMID: 34025952 PMCID: PMC8123783 DOI: 10.1016/j.csbj.2021.05.010] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 05/01/2021] [Accepted: 05/02/2021] [Indexed: 12/23/2022] Open
Abstract
The worldwide health crisis caused by the SARS-Cov-2 virus has resulted in>3 million deaths so far. Improving early screening, diagnosis and prognosis of the disease are critical steps in assisting healthcare professionals to save lives during this pandemic. Since WHO declared the COVID-19 outbreak as a pandemic, several studies have been conducted using Artificial Intelligence techniques to optimize these steps on clinical settings in terms of quality, accuracy and most importantly time. The objective of this study is to conduct a systematic literature review on published and preprint reports of Artificial Intelligence models developed and validated for screening, diagnosis and prognosis of the coronavirus disease 2019. We included 101 studies, published from January 1st, 2020 to December 30th, 2020, that developed AI prediction models which can be applied in the clinical setting. We identified in total 14 models for screening, 38 diagnostic models for detecting COVID-19 and 50 prognostic models for predicting ICU need, ventilator need, mortality risk, severity assessment or hospital length stay. Moreover, 43 studies were based on medical imaging and 58 studies on the use of clinical parameters, laboratory results or demographic features. Several heterogeneous predictors derived from multimodal data were identified. Analysis of these multimodal data, captured from various sources, in terms of prominence for each category of the included studies, was performed. Finally, Risk of Bias (RoB) analysis was also conducted to examine the applicability of the included studies in the clinical setting and assist healthcare providers, guideline developers, and policymakers.
Collapse
Key Words
- ABG, Arterial Blood Gas
- ADA, Adenosine Deaminase
- AI, Artificial Intelligence
- ANN, Artificial Neural Networks
- APTT, Activated Partial Thromboplastin Time
- ARMED, Attribute Reduction with Multi-objective Decomposition Ensemble optimizer
- AUC, Area Under the Curve
- Acc, Accuracy
- Adaboost, Adaptive Boosting
- Apol AI, Apolipoprotein AI
- Apol B, Apolipoprotein B
- Artificial intelligence
- BNB, Bernoulli Naïve Bayes
- BUN, Blood Urea Nitrogen
- CI, Confidence Interval
- CK-MB, Creatine Kinase isoenzyme
- CNN, Convolutional Neural Networks
- COVID-19
- CPP, COVID-19 Positive Patients
- CRP, C-Reactive Protein
- CRT, Classification and Regression Decision Tree
- CoxPH, Cox Proportional Hazards
- DCNN, Deep Convolutional Neural Networks
- DL, Deep Learning
- DLC, Density Lipoprotein Cholesterol
- DNN, Deep Neural Networks
- DT, Decision Tree
- Diagnosis
- ED, Emergency Department
- ESR, Erythrocyte Sedimentation Rate
- ET, Extra Trees
- FCV, Fold Cross Validation
- FL, Federated Learning
- FiO2, Fraction of Inspiration O2
- GBDT, Gradient Boost Decision Tree
- GBM light, Gradient Boosting Machine light
- GDCNN, Genetic Deep Learning Convolutional Neural Network
- GFR, Glomerular Filtration Rate
- GFS, Gradient boosted feature selection
- GGT, Glutamyl Transpeptidase
- GNB, Gaussian Naïve Bayes
- HDLC, High Density Lipoprotein Cholesterol
- INR, International Normalized Ratio
- Inception Resnet, Inception Residual Neural Network
- L1LR, L1 Regularized Logistic Regression
- LASSO, Least Absolute Shrinkage and Selection Operator
- LDA, Linear Discriminant Analysis
- LDH, Lactate Dehydrogenase
- LDLC, Low Density Lipoprotein Cholesterol
- LR, Logistic Regression
- LSTM, Long-Short Term Memory
- MCHC, Mean Corpuscular Hemoglobin Concentration
- MCV, Mean corpuscular volume
- ML, Machine Learning
- MLP, MultiLayer Perceptron
- MPV, Mean Platelet Volume
- MRMR, Maximum Relevance Minimum Redundancy
- Multimodal data
- NB, Naïve Bayes
- NLP, Natural Language Processing
- NPV, Negative Predictive Values
- Nadam optimizer, Nesterov Accelerated Adaptive Moment optimizer
- OB, Occult Blood test
- PCT, Thrombocytocrit
- PPV, Positive Predictive Values
- PWD, Platelet Distribution Width
- PaO2, Arterial Oxygen Tension
- Paco2, Arterial Carbondioxide Tension
- Prognosis
- RBC, Red Blood Cell
- RBF, Radial Basis Function
- RBP, Retinol Binding Protein
- RDW, Red blood cell Distribution Width
- RF, Random Forest
- RFE, Recursive Feature Elimination
- RSV, Respiratory Syncytial Virus
- SEN, Sensitivity
- SG, Specific Gravity
- SMOTE, Synthetic Minority Oversampling Technique
- SPE, Specificity
- SRLSR, Sparse Rescaled Linear Square Regression
- SVM, Support Vector Machine
- SaO2, Arterial Oxygen saturation
- Screening
- TBA, Total Bile Acid
- TTS, Training Test Split
- WBC, White Blood Cell count
- XGB, eXtreme Gradient Boost
- k-NN, K-Nearest Neighbor
Collapse
|
Review |
4 |
52 |
3
|
CirRNAPL: A web server for the identification of circRNA based on extreme learning machine. Comput Struct Biotechnol J 2020; 18:834-842. [PMID: 32308930 PMCID: PMC7153170 DOI: 10.1016/j.csbj.2020.03.028] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2019] [Revised: 03/29/2020] [Accepted: 03/29/2020] [Indexed: 12/27/2022] Open
Abstract
Circular RNA (circRNA) plays an important role in the development of diseases, and it provides a novel idea for drug development. Accurate identification of circRNAs is important for a deeper understanding of their functions. In this study, we developed a new classifier, CirRNAPL, which extracts the features of nucleic acid composition and structure of the circRNA sequence and optimizes the extreme learning machine based on the particle swarm optimization algorithm. We compared CirRNAPL with existing methods, including blast, on three datasets and found CirRNAPL significantly improved the identification accuracy for the three datasets, with accuracies of 0.815, 0.802, and 0.782, respectively. Additionally, we performed sequence alignment on 564 sequences of the independent detection set of the third data set and analyzed the expression level of circRNAs. Results showed the expression level of the sequence is positively correlated with the abundance. A user-friendly CirRNAPL web server is freely available at http://server.malab.cn/CirRNAPL/.
Collapse
Key Words
- ACC, Accuracy
- CNN, Convolutional Neural Networks
- Circular RNA
- DAC, Dinucleotide-based auto-covariance
- DACC, Dinucleotide-based auto-cross-covariance
- DCC, Dinucleotide-based cross-covariance
- ELM, extreme learning machine
- Expression level
- Extreme learning machine
- GAC, Geary autocorrelation
- Identification
- MAC, Moran autocorrelation
- MCC, Matthews Correlation Coefficient
- MRMD, Maximum-Relevance-Maximum-Distance
- NMBAC, Normalized Moreau–Broto autocorrelation
- PC-PseDNC-General, General parallel correlation pseudo-dinucleotide composition
- PCGs, protein coding genes
- PSO, particle swarm optimization algorithm
- Particle swarm optimization algorithm
- PseDPC, Pseudo-distance structure status pair composition
- PseSSC, Pseudo-structure status composition
- RBF, radial basis function
- RF, random forest
- SC-PseDNC-General, General series correlation pseudo-dinucleotide composition
- SE, Sensitivity
- SP, Specifity
- SVM, support vector machine
- Triplet, Local structure-sequence triplet element
- circRNA, circular RNA
- lncRNAs, long non-coding RNAs
Collapse
|
Journal Article |
5 |
29 |
4
|
Montalbo FJP. Diagnosing Covid-19 chest x-rays with a lightweight truncated DenseNet with partial layer freezing and feature fusion. Biomed Signal Process Control 2021; 68:102583. [PMID: 33828610 PMCID: PMC8015405 DOI: 10.1016/j.bspc.2021.102583] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 03/23/2021] [Accepted: 03/26/2021] [Indexed: 12/26/2022]
Abstract
Due to the unforeseen turn of events, our world has undergone another global pandemic from a highly contagious novel coronavirus named COVID-19. The novel virus inflames the lungs similarly to Pneumonia, making it challenging to diagnose. Currently, the common standard to diagnose the virus's presence from an individual is using a molecular real-time Reverse-Transcription Polymerase Chain Reaction (rRT-PCR) test from fluids acquired through nasal swabs. Such a test is difficult to acquire in most underdeveloped countries with a few experts that can perform the test. As a substitute, the widely available Chest X-Ray (CXR) became an alternative to rule out the virus. However, such a method does not come easy as the virus still possesses unknown characteristics that even experienced radiologists and other medical experts find difficult to diagnose through CXRs. Several studies have recently used computer-aided methods to automate and improve such diagnosis of CXRs through Artificial Intelligence (AI) based on computer vision and Deep Convolutional Neural Networks (DCNN), which some require heavy processing costs and other tedious methods to produce. Therefore, this work proposed the Fused-DenseNet-Tiny, a lightweight DCNN model based on a densely connected neural network (DenseNet) truncated and concatenated. The model trained to learn CXR features based on transfer learning, partial layer freezing, and feature fusion. Upon evaluation, the proposed model achieved a remarkable 97.99 % accuracy, with only 1.2 million parameters and a shorter end-to-end structure. It has also shown better performance than some existing studies and other massive state-of-the-art models that diagnosed COVID-19 from CXRs.
Collapse
Key Words
- AP, Average Pooling
- AUC, Area Under the Curve
- BN, Batch Normalization
- BS, Batch Size
- CAD, Computer-Aided Diagnosis
- CCE, Categorical Cross-Entropy
- CNN, Convolutional Neural Networks
- CT, Computer Tomography
- CV, Computer Vision
- CXR, Chest X-Rays
- Chest x-rays
- Computer-aided diagnosis
- Covid-19
- DCNN, Deep Convolutional Neural Networks
- DL, Deep Learning
- DR, Dropout Rate
- Deep learning
- Densely connected neural networks
- GAP, Global Average Pooling
- GRAD-CAM, Gradient-Weighted Class Activation Maps
- JPG, Joint Photographic Group
- LR, Learning Rate
- MP, Max-Pooling
- P-R, Precision-Recall
- PEPX, Projection-Expansion-Projection-Extension
- ROC, Receiver Operating Characteristic
- ReLU, Rectified Linear Unit
- SGD, Stochastic Gradient Descent
- WHO, World Health Organization
- rRT-PCR, real-time Reverse-Transcription Polymerase Chain Reaction
Collapse
|
research-article |
4 |
13 |
5
|
Kishimoto T, Takamiya A, Liang KC, Funaki K, Fujita T, Kitazawa M, Yoshimura M, Tazawa Y, Horigome T, Eguchi Y, Kikuchi T, Tomita M, Bun S, Murakami J, Sumali B, Warnita T, Kishi A, Yotsui M, Toyoshiba H, Mitsukura Y, Shinoda K, Sakakibara Y, Mimura M, PROMPT collaborators. The project for objective measures using computational psychiatry technology (PROMPT): Rationale, design, and methodology. Contemp Clin Trials Commun 2020; 19:100649. [PMID: 32913919 PMCID: PMC7473877 DOI: 10.1016/j.conctc.2020.100649] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Revised: 08/06/2020] [Accepted: 08/16/2020] [Indexed: 01/08/2023] Open
Abstract
INTRODUCTION Depressive and neurocognitive disorders are debilitating conditions that account for the leading causes of years lived with disability worldwide. However, there are no biomarkers that are objective or easy-to-obtain in daily clinical practice, which leads to difficulties in assessing treatment response and developing new drugs. New technology allows quantification of features that clinicians perceive as reflective of disorder severity, such as facial expressions, phonic/speech information, body motion, daily activity, and sleep. METHODS Major depressive disorder, bipolar disorder, and major and minor neurocognitive disorders as well as healthy controls are recruited for the study. A psychiatrist/psychologist conducts conversational 10-min interviews with participants ≤10 times within up to five years of follow-up. Interviews are recorded using RGB and infrared cameras, and an array microphone. As an option, participants are asked to wear wrist-band type devices during the observational period. Various software is used to process the raw video, voice, infrared, and wearable device data. A machine learning approach is used to predict the presence of symptoms, severity, and the improvement/deterioration of symptoms. DISCUSSION The overall goal of this proposed study, the Project for Objective Measures Using Computational Psychiatry Technology (PROMPT), is to develop objective, noninvasive, and easy-to-use biomarkers for assessing the severity of depressive and neurocognitive disorders in the hopes of guiding decision-making in clinical settings as well as reducing the risk of clinical trial failure. Challenges may include the large variability of samples, which makes it difficult to extract the features that commonly reflect disorder severity. TRIAL REGISTRATION UMIN000021396, University Hospital Medical Information Network (UMIN).
Collapse
Key Words
- AMED, Japan Agency for Medical Research and Development
- Adabag, Adaptive Bagging
- Adaboost, Adaptive Boosting
- BD, Bipolar disorder
- BDI-II, Beck Depression Inventory, Second Edition
- BNN, Bayesian Neural Networks
- CDR, Clinical Dementia Rating
- CDT, Clock Drawing Test
- CNN, Convolutional Neural Networks
- CPP, cepstral peak prominence
- DSM-5, Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition
- Depression
- F0, fundamental frequency
- F1, F2, F3, first, second, and third formant frequencies
- FedRAMP, Federal Risk and Authorization Management Program
- GCNN, Gated Convolutional Neural Networks
- GDS, Geriatric Depression Scale
- HAM-D, Hamilton Depression Rating Scale
- IEC, International Electrotechnical Commission
- ISO, International Organization for Standardization
- LM, Wechsler Memory Scale-Revised Logical Memory
- LSTM, Long Short-Term Memory Networks
- M.I.N.I., Mini-International Neuropsychiatric Interview
- MADRS, Montgomery-Asberg Depression Rating Scale
- MARS, Motor Agitation and Retardation Scale
- MCI, mild cognitive impairment
- MDD, Major depressive disorder
- MFCC, mel-frequency cepstrum coefficients
- MMSE, Mini-Mental State Examination
- MRI, magnetic resonance imaging
- Machine learning
- MoCA, Montreal Cognitive Assessment
- NPI, Neuropsychiatric Inventory
- Natural language processing
- Neurocognitive disorder
- PET, positron emission tomography
- PROMPT, Project for Objective Measures Using Computational Psychiatry Technology
- PSQI, Pittsburgh Sleep Quality Index
- RF, Random Forest
- RGB, red, green, blue
- SCID, Structural Clinical Interview for DSM-5
- SVM, Support Vector Machine
- SVR, Support Vector Regression
- Screening
- UI, uncertainty interval
- UMIN, University Hospital Medical Information Network
- UV, ultraviolet
- YLDs, years lived with disability
- YMRS, Young Mania Rating Scale
Collapse
|
research-article |
5 |
6 |