1
|
Zhang J, Mucs D, Norinder U, Svensson F. LightGBM: An Effective and Scalable Algorithm for Prediction of Chemical Toxicity-Application to the Tox21 and Mutagenicity Data Sets. J Chem Inf Model 2019; 59:4150-4158. [PMID: 31560206 DOI: 10.1021/acs.jcim.9b00633] [Citation(s) in RCA: 93] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Machine learning algorithms have attained widespread use in assessing the potential toxicities of pharmaceuticals and industrial chemicals because of their faster speed and lower cost compared to experimental bioassays. Gradient boosting is an effective algorithm that often achieves high predictivity, but historically the relative long computational time limited its applications in predicting large compound libraries or developing in silico predictive models that require frequent retraining. LightGBM, a recent improvement of the gradient boosting algorithm, inherited its high predictivity but resolved its scalability and long computational time by adopting a leaf-wise tree growth strategy and introducing novel techniques. In this study, we compared the predictive performance and the computational time of LightGBM to deep neural networks, random forests, support vector machines, and XGBoost. All algorithms were rigorously evaluated on publicly available Tox21 and mutagenicity data sets using a Bayesian optimization integrated nested 10-fold cross-validation scheme that performs hyperparameter optimization while examining model generalizability and transferability to new data. The evaluation results demonstrated that LightGBM is an effective and highly scalable algorithm offering the best predictive performance while consuming significantly shorter computational time than the other investigated algorithms across all Tox21 and mutagenicity data sets. We recommend LightGBM for applications of in silico safety assessment and also other areas of cheminformatics to fulfill the ever-growing demand for accurate and rapid prediction of various toxicity or activity related end points of large compound libraries present in the pharmaceutical and chemical industry.
Collapse
Affiliation(s)
- Jin Zhang
- Department of Chemistry , Umeå University , SE-901 87 Umeå , Sweden
| | - Daniel Mucs
- Swetox, Unit of Toxicology Sciences , Karolinska Institutet , Forskargatan 20 , SE-151 36 Södertälje , Sweden
| | - Ulf Norinder
- Swetox, Unit of Toxicology Sciences , Karolinska Institutet , Forskargatan 20 , SE-151 36 Södertälje , Sweden.,Department of Computer and Systems Sciences , Stockholm University , Box 7003, SE-164 07 Kista , Sweden
| | - Fredrik Svensson
- The Alzheimer's Research UK University College London Drug Discovery Institute , The Cruciform Building, Gower Street , London WC1E 6BT , U.K.,The Francis Crick Institute , 1 Midland Road , London NW1 1AT , U.K
| |
Collapse
|
2
|
Inferring Genes and Biological Functions That Are Sensitive to the Severity of Toxicity Symptoms. Int J Mol Sci 2017; 18:ijms18040755. [PMID: 28368331 PMCID: PMC5412340 DOI: 10.3390/ijms18040755] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2016] [Revised: 03/23/2017] [Accepted: 03/30/2017] [Indexed: 11/16/2022] Open
Abstract
The effective development of new drugs relies on the identification of genes that are related to the symptoms of toxicity. Although many researchers have inferred toxicity markers, most have focused on discovering toxicity occurrence markers rather than toxicity severity markers. In this study, we aimed to identify gene markers that are relevant to both the occurrence and severity of toxicity symptoms. To identify gene markers for each of four targeted liver toxicity symptoms, we used microarray expression profiles and pathology data from 14,143 in vivo rat samples. The gene markers were found using sparse linear discriminant analysis (sLDA) in which symptom severity is used as a class label. To evaluate the inferred gene markers, we constructed regression models that predicted the severity of toxicity symptoms from gene expression profiles. Our cross-validated results revealed that our approach was more successful at finding gene markers sensitive to the aggravation of toxicity symptoms than conventional methods. Moreover, these markers were closely involved in some of the biological functions significantly related to toxicity severity in the four targeted symptoms.
Collapse
|
3
|
Rezaei Kolahchi A, Khadem Mohtaram N, Pezeshgi Modarres H, Mohammadi MH, Geraili A, Jafari P, Akbari M, Sanati-Nezhad A. Microfluidic-Based Multi-Organ Platforms for Drug Discovery. MICROMACHINES 2016; 7:E162. [PMID: 30404334 PMCID: PMC6189912 DOI: 10.3390/mi7090162] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Revised: 08/23/2016] [Accepted: 08/24/2016] [Indexed: 12/18/2022]
Abstract
Development of predictive multi-organ models before implementing costly clinical trials is central for screening the toxicity, efficacy, and side effects of new therapeutic agents. Despite significant efforts that have been recently made to develop biomimetic in vitro tissue models, the clinical application of such platforms is still far from reality. Recent advances in physiologically-based pharmacokinetic and pharmacodynamic (PBPK-PD) modeling, micro- and nanotechnology, and in silico modeling have enabled single- and multi-organ platforms for investigation of new chemical agents and tissue-tissue interactions. This review provides an overview of the principles of designing microfluidic-based organ-on-chip models for drug testing and highlights current state-of-the-art in developing predictive multi-organ models for studying the cross-talk of interconnected organs. We further discuss the challenges associated with establishing a predictive body-on-chip (BOC) model such as the scaling, cell types, the common medium, and principles of the study design for characterizing the interaction of drugs with multiple targets.
Collapse
Affiliation(s)
- Ahmad Rezaei Kolahchi
- BioMEMS and Bioinspired Microfluidic Laboratory, Department of Mechanical and Manufacturing Engineering, University of Calgary, 2500 University Drive NW, Calgary, AB T2N 1N4, Canada.
| | - Nima Khadem Mohtaram
- Laboratory for Innovations in MicroEngineering (LiME), Department of Mechanical Engineering, University of Victoria, Victoria, BC V8P 5C2, Canada.
- Division of Medical Sciences, University of Victoria, Victoria, BC V8P 5C2, Canada.
- Department of Biomedical Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA.
| | - Hassan Pezeshgi Modarres
- BioMEMS and Bioinspired Microfluidic Laboratory, Department of Mechanical and Manufacturing Engineering, University of Calgary, 2500 University Drive NW, Calgary, AB T2N 1N4, Canada.
| | - Mohammad Hossein Mohammadi
- Department of Chemical and Petroleum Engineering, Sharif University of Technology, Azadi Ave., Tehran 11155-9516, Iran.
| | - Armin Geraili
- Department of Chemical and Petroleum Engineering, Sharif University of Technology, Azadi Ave., Tehran 11155-9516, Iran.
| | - Parya Jafari
- Department of Electrical Engineering, Sharif University of Technology, Azadi Ave., Tehran 11155-9516, Iran.
| | - Mohsen Akbari
- Laboratory for Innovations in MicroEngineering (LiME), Department of Mechanical Engineering, University of Victoria, Victoria, BC V8P 5C2, Canada.
- Division of Medical Sciences, University of Victoria, Victoria, BC V8P 5C2, Canada.
| | - Amir Sanati-Nezhad
- BioMEMS and Bioinspired Microfluidic Laboratory, Department of Mechanical and Manufacturing Engineering, University of Calgary, 2500 University Drive NW, Calgary, AB T2N 1N4, Canada.
- Center for Bioengineering Research and Education, Biomedical Engineering Program, University of Calgary, 2500 University Drive NW, Calgary, AB T2N 1N4, Canada.
| |
Collapse
|
4
|
Kim J, Shin M. An integrative model of multi-organ drug-induced toxicity prediction using gene-expression data. BMC Bioinformatics 2014; 15 Suppl 16:S2. [PMID: 25522097 PMCID: PMC4290650 DOI: 10.1186/1471-2105-15-s16-s2] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background In practice, some drugs produce a number of negative biological effects that can mitigate their effectiveness as a remedy. To address this issue, several studies have been performed for the prediction of drug-induced toxicity from gene-expression data, and a significant amount of work has been done on predicting limited drug-induced symptoms or single-organ toxicity. Since drugs often lead to some injuries in several organs like liver or kidney, however, it would be very useful to forecast the drug-induced injuries for multiple organs. Therefore, in this work, our aim was to develop a multi-organ toxicity prediction model using an integrative model of gene-expression data. Results To train our integrative model, we used 3708 in-vivo samples of gene-expression profiles exposed to one of 41 drugs related to 21 distinct physiological changes divided between liver and kidney (liver 11, kidney 10). Specifically, we used the gene-expression profiles to learn an ensemble classifier for each of 21 pathology prediction models. Subsequently, these classifiers were combined with weights to generate an integrative model for each pathological finding. The integrative model outputs the likeliness of presenting the trained pathology in a given test sample of gene-expression profile, called an integrative prediction score (IPS). For the evaluation of an integrative model, we estimated the prediction performance with the k-fold cross-validation. Our results demonstrate that the proposed integrative model is superior to individual pathology prediction models in predicting multi-organ drug-induced toxicities over all the targeted pathological findings. On average, the AUC of the integrative models was 88% while the AUC of individual pathology prediction models was 68%. Conclusions Not only does this integrative model produce comparable prediction performance to existing approaches, but also it produces very stable performance overall. In addition, our approach is easily expandable to a variety of other multi-organ toxicology applications.
Collapse
|