1
|
Olatunji SO, Alsheikh N, Alnajrani L, Alanazy A, Almusairii M, Alshammasi S, Alansari A, Zaghdoud R, Alahmadi A, Basheer Ahmed MI, Ahmed MS, Alhiyafi J. Comprehensible Machine-Learning-Based Models for the Pre-Emptive Diagnosis of Multiple Sclerosis Using Clinical Data: A Retrospective Study in the Eastern Province of Saudi Arabia. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:4261. [PMID: 36901273 PMCID: PMC10002108 DOI: 10.3390/ijerph20054261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 02/22/2023] [Accepted: 02/24/2023] [Indexed: 06/18/2023]
Abstract
Multiple Sclerosis (MS) is characterized by chronic deterioration of the nervous system, mainly the brain and the spinal cord. An individual with MS develops the condition when the immune system begins attacking nerve fibers and the myelin sheathing that covers them, affecting the communication between the brain and the rest of the body and eventually causing permanent damage to the nerve. Patients with MS (pwMS) might experience different symptoms depending on which nerve was damaged and how much damage it has sustained. Currently, there is no cure for MS; however, there are clinical guidelines that help control the disease and its accompanying symptoms. Additionally, no specific laboratory biomarker can precisely identify the presence of MS, leaving specialists with a differential diagnosis that relies on ruling out other possible diseases with similar symptoms. Since the emergence of Machine Learning (ML) in the healthcare industry, it has become an effective tool for uncovering hidden patterns that aid in diagnosing several ailments. Several studies have been conducted to diagnose MS using ML and Deep Learning (DL) models trained using MRI images, achieving promising results. However, complex and expensive diagnostic tools are needed to collect and examine imaging data. Thus, the intention of this study is to implement a cost-effective, clinical data-driven model that is capable of diagnosing pwMS. The dataset was obtained from King Fahad Specialty Hospital (KFSH) in Dammam, Saudi Arabia. Several ML algorithms were compared, namely Support Vector Machine (SVM), Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Adaptive Boosting (AdaBoost), and Extra Trees (ET). The results indicated that the ET model outpaced the rest with an accuracy of 94.74%, recall of 97.26%, and precision of 94.67%.
Collapse
Affiliation(s)
- Sunday O. Olatunji
- College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Nawal Alsheikh
- College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Lujain Alnajrani
- College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Alhatoon Alanazy
- College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Meshael Almusairii
- College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Salam Alshammasi
- College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Aisha Alansari
- College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Rim Zaghdoud
- College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Alaa Alahmadi
- College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Mohammed Imran Basheer Ahmed
- College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Mohammed Salih Ahmed
- College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Jamal Alhiyafi
- Department of Computer Science, Kettering University, Flint, MI 48504, USA
| |
Collapse
|
2
|
Song Y, Chen J, Wang W, Chen G, Ma Z. Double-head transformer neural network for molecular property prediction. J Cheminform 2023; 15:27. [PMID: 36823530 PMCID: PMC9951429 DOI: 10.1186/s13321-023-00700-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Accepted: 02/16/2023] [Indexed: 02/25/2023] Open
Abstract
Existing molecular property prediction methods based on deep learning ignore the generalization ability of the nonlinear representation of molecular features and the reasonable assignment of weights of molecular features, making it difficult to further improve the accuracy of molecular property prediction. To solve the above problems, an end-to-end double-head transformer neural network (DHTNN) is proposed in this paper for high-precision molecular property prediction. For the data distribution characteristics of the molecular dataset, DHTNN specially designs a new activation function, beaf, which can greatly improve the generalization ability of the nonlinear representation of molecular features. A residual network is introduced in the molecular encoding part to solve the gradient explosion problem and ensure that the model can converge quickly. The transformer based on double-head attention is used to extract molecular intrinsic detail features, and the weights are reasonably assigned for predicting molecular properties with high accuracy. Our model, which was tested on the MoleculeNet [1] benchmark dataset, showed significant performance improvements over other state-of-the-art methods.
Collapse
Affiliation(s)
- Yuanbing Song
- College of Communication and Art Design, University of Shanghai for Science and Technology, Shanghai, China
| | - Jinghua Chen
- College of Communication and Art Design, University of Shanghai for Science and Technology, Shanghai, China
| | - Wenju Wang
- College of Communication and Art Design, University of Shanghai for Science and Technology, Shanghai, China.
| | - Gang Chen
- College of Communication and Art Design, University of Shanghai for Science and Technology, Shanghai, China
| | - Zhichong Ma
- College of Communication and Art Design, University of Shanghai for Science and Technology, Shanghai, China
| |
Collapse
|
3
|
Khan A, Uddin J, Ali F, Kumar H, Alghamdi W, Ahmad A. AFP-SPTS: An Accurate Prediction of Antifreeze Proteins Using Sequential and Pseudo-Tri-Slicing Evolutionary Features with an Extremely Randomized Tree. J Chem Inf Model 2023; 63:826-834. [PMID: 36649569 DOI: 10.1021/acs.jcim.2c01417] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
The development of intracellular ice in the bodies of cold-blooded living organisms may cause them to die. These species yield antifreeze proteins (AFPs) to live in subzero temperature environments. Additionally, AFPs are implemented in biotechnological, industrial, agricultural, and medical fields. Machine learning-based predictors were presented for AFP identification. However, more accurate predictors are still highly desirable for boosting the AFP prediction. This work presents a novel approach, named AFP-SPTS, for the correct prediction of AFPs. We explored the discriminative features with four schemes, namely, dipeptide deviation from the expected mean (DDE), reduced amino acid alphabet (RAAA), grouped dipeptide composition (GDPC), and a novel representative method, called pseudo-position-specific scoring matrix tri-slicing (PseTS-PSSM). Considering the advantages of ensemble learning strategy, we fused each feature vector into different combinations and trained the models with five machine learning algorithms, i.e., multilayer perceptron (MLP), extremely randomized tree (ERT), decision tree (DT), random forest (RF), and AdaBoost. Among all models, PseTS-PSSM + RAAA with an extremely randomized tree attained the best outcomes. The proposed predictor (AFP-SPTS) boosted the accuracies of AFPs in the literature by 1.82 and 4.1%.
Collapse
Affiliation(s)
- Adnan Khan
- Qurtuba University of Science and Information Technology, Peshawar5000, Khyber Pakhtunkhwa, Pakistan
| | - Jamal Uddin
- Qurtuba University of Science and Information Technology, Peshawar5000, Khyber Pakhtunkhwa, Pakistan
| | - Farman Ali
- Sarhad University of Science and Information Technology, Mardan Campus, Peshawar23200, Pakistan.,Department of Elementary and Secondary Education Department, Government of Khyber Pakhtunkhwa, Peshawar5000, Khyber Pakhtunkhwa, Pakistan
| | - Harish Kumar
- Department of Computer Science, College of Computer Science, King Khalid University, Abha61421, Saudi Arabia
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King AbdulAziz University, Jeddah21589, Saudi Arabia
| | - Aftab Ahmad
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan23200, Pakistan
| |
Collapse
|
4
|
Application of Extremely Randomised Trees for exploring influential factors on variant crash severity data. Sci Rep 2022; 12:11476. [PMID: 35798814 PMCID: PMC9263179 DOI: 10.1038/s41598-022-15693-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 06/28/2022] [Indexed: 11/08/2022] Open
Abstract
Crash severity models play a crucial role in evaluating the influencing factors in the severity of traffic crashes. In this study, Extremely Randomised Tree (ERT) is used as a machine learning technique to analyse the severity of crashes. The crash data in the province of Khorasan Razavi, Iran, for a period of 5 years from 2013 to 2017, is used for crash severity model development. The dataset includes traffic-related variables, vehicle specifications, vehicle movement, land use characteristics, temporal characteristics, and environmental variables. In this paper, Feature Importance Analysis (FIA), Partial Dependence Plots (PDP), and Individual Conditional Expectation (ICE) plots are utilised to analyse and interpret the results. According to the results, the involvement of vulnerable road users such as motorcyclists and pedestrians alongside traffic-related variables are among the most significant variables in crash severity. Results show that the presence of motorcycles can increase the probability of injury crashes by around 30% and almost double the probability of fatal crashes. Analysing the interaction of PDPs shows that driving speeds above 60 km/h in residential areas raises the probability of injury crashes by about 10%. In addition, at speeds higher than 70 km/h, the presence of pedestrians approximately increases the probability of fatal crashes by 6%.
Collapse
|
5
|
Zhao ZW, Del Cueto M, Troisi A. Limitations of machine learning models when predicting compounds with completely new chemistries: possible improvements applied to the discovery of new non-fullerene acceptors. DIGITAL DISCOVERY 2022; 1:266-276. [PMID: 35769202 PMCID: PMC9189862 DOI: 10.1039/d2dd00004k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 03/23/2022] [Indexed: 11/21/2022]
Abstract
We try to determine if machine learning (ML) methods, applied to the discovery of new materials on the basis of existing data sets, have the power to predict completely new classes of compounds (extrapolating) or perform well only when interpolating between known materials. We introduce the leave-one-group-out cross-validation, in which the ML model is trained to explicitly perform extrapolations of unseen chemical families. This approach can be used across materials science and chemistry problems to improve the added value of ML predictions, instead of using extrapolative ML models that were trained with a regular cross-validation. We consider as a case study the problem of the discovery of non-fullerene acceptors because novel classes of acceptors are naturally classified into distinct chemical families. We show that conventional ML methods are not useful in practice when attempting to predict the efficiency of a completely novel class of materials. The approach proposed in this work increases the accuracy of the predictions to enable at least the categorization of materials with a performance above and below the median value.
Collapse
Affiliation(s)
- Zhi-Wen Zhao
- Department of Chemistry, University of Liverpool Liverpool L69 3BX UK
- Institute of Functional Material Chemistry, Faculty of Chemistry, Northeast Normal University Changchun 130024 Jilin P. R. China
| | - Marcos Del Cueto
- Department of Chemistry, University of Liverpool Liverpool L69 3BX UK
| | - Alessandro Troisi
- Department of Chemistry, University of Liverpool Liverpool L69 3BX UK
| |
Collapse
|
6
|
Extracting Information on Rocky Desertification from Satellite Images: A Comparative Study. REMOTE SENSING 2021. [DOI: 10.3390/rs13132497] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Rocky desertification occurs in many karst terrains of the world and poses major challenges for regional sustainable development. Remotely sensed data can provide important information on rocky desertification. In this study, three common open-access satellite image datasets (Sentinel-2B, Landsat-8, and Gaofen-6) were used for extracting information on rocky desertification in a typical karst region (Guangnan County, Yunnan) of southwest China, using three machine-learning algorithms implemented in the Python programming language: random forest (RF), bagged decision tree (BDT), and extremely randomized trees (ERT). Comparative analyses of the three data sources and three algorithms show that: (1) The Sentinel-2B image has the best capability for extracting rocky desertification information, with an overall accuracy (OA) of 85.21% using the ERT method. This can be attributed to the higher spatial resolution of the Sentinel-2B image than that of Landsat-8 and Gaofen-6 images and Gaofen-6’s lack of the shortwave infrared (SWIR) bands suitable for mapping carbonate rocks. (2) The ERT method has the best classification results of rocky desertification. Compared with the RF and BDT methods, the ERT method has stronger randomness in modeling and can effectively identify important feature factors for extracting information on rocky desertification. (3) The combination of the Sentinel-2B images and the ERT method provides an effective, efficient, and free approach to information extraction for mapping rocky desertification. The study can provide a useful reference for effective mapping of rocky desertification in similar karst environments of the world, in terms of both satellite image sources and classification algorithms. It also provides important information on the total area and spatial distribution of different levels of rocky desertification in the study area to support decision making by local governments for sustainable development.
Collapse
|
7
|
Adaptive ranking based ensemble learning of Gaussian process regression models for quality-related variable prediction in process industries. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2020.107060] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
8
|
Climate and land use change induced future flood susceptibility assessment in a sub-tropical region of India. Soft comput 2021. [DOI: 10.1007/s00500-021-05584-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
9
|
Munshi J, Chen W, Chien T, Balasubramanian G. Transfer Learned Designer Polymers For Organic Solar Cells. J Chem Inf Model 2021; 61:134-142. [PMID: 33410685 DOI: 10.1021/acs.jcim.0c01157] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Organic photovoltaic (OPV) materials have been examined extensively over the past two decades for solar cell applications because of the potential for device flexibility, low-temperature solution processability, and negligible environmental impact. However, discovery of new candidate OPV materials, especially polymer-based electron donors, that demonstrate notable power conversion efficiencies (PCEs), is nontrivial and time-intensive exercise given the extensive set of possible chemistries. Recent progress in machine learning accelerated materials discovery has facilitated to address this challenge, with molecular line representations, such as Simplified Molecular-Input Line-Entry Systems (SMILES), gaining popularity as molecular fingerprints describing the donor chemical structures. Here, we employ a transfer learning based recurrent neural (LSTM) model, which harnesses the SMILES molecular fingerprints as an input to generate novel designer chemistries for OPV devices. The generative model, perfected on a small focused OPV data set, predicts new polymer repeat units with potentially high PCE. Calculations of the similarity coefficient between the known and the generated polymers corroborate the accuracy of the model predictability as a function of the underlying chemical specificity. The data-enabled framework is sufficiently generic for use in accelerated machine learned materials discovery for various chemistries and applications, mining the hitherto available experimental and computational data.
Collapse
Affiliation(s)
- Joydeep Munshi
- Department of Mechanical Engineering & Mechanics, Lehigh University, Bethlehem, Pennsylvania 18015, United States
| | - Wei Chen
- Department of Mechanical Engineering, Northwestern University, Evanston, Illinois 60208, United States
| | - TeYu Chien
- Department of Physics & Astronomy, University of Wyoming, Laramie, Wyoming 82071, United States
| | - Ganesh Balasubramanian
- Department of Mechanical Engineering & Mechanics, Lehigh University, Bethlehem, Pennsylvania 18015, United States
| |
Collapse
|
10
|
Flash Flood Susceptibility Modeling Using New Approaches of Hybrid and Ensemble Tree-Based Machine Learning Algorithms. REMOTE SENSING 2020. [DOI: 10.3390/rs12213568] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Flash flooding is considered one of the most dynamic natural disasters for which measures need to be taken to minimize economic damages, adverse effects, and consequences by mapping flood susceptibility. Identifying areas prone to flash flooding is a crucial step in flash flood hazard management. In the present study, the Kalvan watershed in Markazi Province, Iran, was chosen to evaluate the flash flood susceptibility modeling. Thus, to detect flash flood-prone zones in this study area, five machine learning (ML) algorithms were tested. These included boosted regression tree (BRT), random forest (RF), parallel random forest (PRF), regularized random forest (RRF), and extremely randomized trees (ERT). Fifteen climatic and geo-environmental variables were used as inputs of the flash flood susceptibility models. The results showed that ERT was the most optimal model with an area under curve (AUC) value of 0.82. The rest of the models’ AUC values, i.e., RRF, PRF, RF, and BRT, were 0.80, 0.79, 0.78, and 0.75, respectively. In the ERT model, the areal coverage for very high to moderate flash flood susceptible area was 582.56 km2 (28.33%), and the rest of the portion was associated with very low to low susceptibility zones. It is concluded that topographical and hydrological parameters, e.g., altitude, slope, rainfall, and the river’s distance, were the most effective parameters. The results of this study will play a vital role in the planning and implementation of flood mitigation strategies in the region.
Collapse
|
11
|
Soil Erosion Susceptibility Mapping in Kozetopraghi Catchment, Iran: A Mixed Approach Using Rainfall Simulator and Data Mining Techniques. LAND 2020. [DOI: 10.3390/land9100368] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Soil erosion determines landforms, soil formation and distribution, soil fertility, and land degradation processes. In arid and semiarid ecosystems, soil erosion is a key process to understand, foresee, and prevent desertification. Addressing soil erosion throughout watersheds scales requires basic information to develop soil erosion control strategies and to reduce land degradation. To assess and remediate the non-sustainable soil erosion rates, restoration programs benefit from the knowledge of the spatial distribution of the soil losses to develop maps of soil erosion. This study presents Support Vector Machine (SVM), Random Forest (RF), and adaptive boosting (AdaBoost) data mining models to map soil erosion susceptibility in Kozetopraghi watershed, Iran. A soil erosion inventory map was prepared from field rainfall simulation experiments on 174 randomly selected points along the Kozetopraghi watershed. In previous studies, this map has been prepared using indirect methods such as the Universal Soil Loss Equation to assess soil erosion. Direct field measurements for mapping soil erosion susceptibility have so far not been carried out in our study site in the past. The soil erosion rate data generated by simulated rainfall in 1 m2 plots at rainfall rate of 40 mmh−1 was used to develop the soil erosion map. Of the available data, 70% and 30% were randomly classified to calibrate and validate the models, respectively. As a result, the RF model with the highest area under the curve (AUC) value in a receiver operating characteristics (ROC) curve (0.91), and the lowest mean square error (MSE) value (0.09), has the most concordance and spatial differentiation. Sensitivity analysis by Jackknife and IncNodePurity methods indicates that the slope angle is the most important factor within the soil erosion susceptibility map. The RF susceptibility map showed that the areas located in the center and near the watershed outlet have the most susceptibility to soil erosion. This information can be used to support the development of sustainable restoration plans with more accuracy. Our methodology has been evaluated and can be also applied in other regions.
Collapse
|
12
|
Venkatraman V. Evaluation of Molecular Fingerprints for Determining Dye Aggregation on Semiconductor Surfaces. Mol Inform 2020; 41:e2000062. [PMID: 32476288 DOI: 10.1002/minf.202000062] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Accepted: 05/31/2020] [Indexed: 01/19/2023]
Abstract
Dye aggregation plays an important role in determining the photovoltaic performance of dye sensitized solar cells. Compared with the spectra observed in solution, it is, apriori, difficult to ascertain whether a dye is likely to show hypsochromic (H) or bathochromic (J) aggregation, until after adsorption onto the semiconductor electrode. Herein, we show that molecular fingerprint-based methods provide a fast and efficient way to discriminate between H- and J-aggregating dyes. The efficacy of the fingerprint-based classification models is demonstrated with a diverse set of over 3000 organic dyes dissolved in different solvents. Requiring only the structure of the dye and the polarity of the solvent used, the machine learning model achieves close to 80 % classification accuracies that are comparable with models based on a combination of fragment counts and topological indices. For interested researchers, we have bundled the prediction tools as an R package.
Collapse
|
13
|
Govindaraj RG, Subramaniyam S, Manavalan B. Extremely-randomized-tree-based Prediction of N 6-Methyladenosine Sites in Saccharomyces cerevisiae. Curr Genomics 2020; 21:26-33. [PMID: 32655295 PMCID: PMC7324895 DOI: 10.2174/1389202921666200219125625] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Revised: 12/28/2019] [Accepted: 01/24/2020] [Indexed: 02/07/2023] Open
Abstract
Introduction N6-methyladenosine (m6A) is one of the most common post-transcriptional modifications in RNA, which has been related to several biological processes. The accurate prediction of m6A sites from RNA sequences is one of the challenging tasks in computational biology. Several computational methods utilizing machine-learning algorithms have been proposed that accelerate in silico screening of m6A sites, thereby drastically reducing the experimental time and labor costs involved. Methodology In this study, we proposed a novel computational predictor termed ERT-m6Apred, for the accurate prediction of m6A sites. To identify the feature encodings with more discriminative capability, we applied a two-step feature selection technique on seven different feature encodings and identified the corresponding optimal feature set. Results Subsequently, performance comparison of the corresponding optimal feature set-based extremely randomized tree model revealed that Pseudo k-tuple composition encoding, which includes 14 physicochemical properties significantly outperformed other encodings. Moreover, ERT-m6Apred achieved an accuracy of 78.84% during cross-validation analysis, which is comparatively better than recently reported predictors. Conclusion In summary, ERT-m6Apred predicts Saccharomyces cerevisiae m6A sites with higher accuracy, thus facilitating biological hypothesis generation and experimental validations.
Collapse
Affiliation(s)
- Rajiv G Govindaraj
- 1HotSpot Therapeutics, 50 Milk Street, 16 Floor, Boston, MA02109, USA; 2Research and Development Center, In-silicogen Inc., Yongin-si 16954, Gyeonggi-do, Republic of Korea; 3Department of Biotechnology, Dr. N.G.P. Arts and Science College, Coimbatore, Tamil Nadu641048, India; 4Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Sathiyamoorthy Subramaniyam
- 1HotSpot Therapeutics, 50 Milk Street, 16 Floor, Boston, MA02109, USA; 2Research and Development Center, In-silicogen Inc., Yongin-si 16954, Gyeonggi-do, Republic of Korea; 3Department of Biotechnology, Dr. N.G.P. Arts and Science College, Coimbatore, Tamil Nadu641048, India; 4Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Balachandran Manavalan
- 1HotSpot Therapeutics, 50 Milk Street, 16 Floor, Boston, MA02109, USA; 2Research and Development Center, In-silicogen Inc., Yongin-si 16954, Gyeonggi-do, Republic of Korea; 3Department of Biotechnology, Dr. N.G.P. Arts and Science College, Coimbatore, Tamil Nadu641048, India; 4Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
| |
Collapse
|
14
|
Sahu H, Ma H. Unraveling Correlations between Molecular Properties and Device Parameters of Organic Solar Cells Using Machine Learning. J Phys Chem Lett 2019; 10:7277-7284. [PMID: 31702163 DOI: 10.1021/acs.jpclett.9b02772] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Understanding the relationships between molecular properties and device parameters is highly desired not only to improve the overall performance of an organic solar cell but also to fulfill the requirements of a device for a particular application such as solar-to-fuel energy conversion (high open-circuit voltage (VOC)) or solar window applications (high short circuit current (JSC)). In this work, a series of machine learning models are built for three important device characteristics (VOC, JSC, and fill factor) using 13 crucial molecular properties as descriptors, resulting in an impressive predictive performance (r = 0.7). These models may play a vital role in designing promising organic materials for a specific photovoltaic application with high VOC/JSC. The importance of descriptors for each device parameter is unraveled, which may assist in tuning them and improve understanding of the energy conversion process.
Collapse
Affiliation(s)
- Harikrishna Sahu
- Key Laboratory of Mesoscopic Chemistry of MOE, School of Chemistry and Chemical Engineering , Nanjing University , Nanjing 210023 , China
| | - Haibo Ma
- Key Laboratory of Mesoscopic Chemistry of MOE, School of Chemistry and Chemical Engineering , Nanjing University , Nanjing 210023 , China
| |
Collapse
|