1
|
Bassan A, Steigerwalt R, Keller D, Beilke L, Bradley PM, Bringezu F, Brock WJ, Burns-Naas LA, Chambers J, Cross K, Dorato M, Elespuru R, Fuhrer D, Hall F, Hartke J, Jahnke GD, Kluxen FM, McDuffie E, Schmidt F, Valentin JP, Woolley D, Zane D, Myatt GJ. Developing a pragmatic consensus procedure supporting the ICH S1B(R1) weight of evidence carcinogenicity assessment. FRONTIERS IN TOXICOLOGY 2024; 6:1370045. [PMID: 38646442 PMCID: PMC11027748 DOI: 10.3389/ftox.2024.1370045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Accepted: 03/04/2024] [Indexed: 04/23/2024] Open
Abstract
The ICH S1B carcinogenicity global testing guideline has been recently revised with a novel addendum that describes a comprehensive integrated Weight of Evidence (WoE) approach to determine the need for a 2-year rat carcinogenicity study. In the present work, experts from different organizations have joined efforts to standardize as much as possible a procedural framework for the integration of evidence associated with the different ICH S1B(R1) WoE criteria. The framework uses a pragmatic consensus procedure for carcinogenicity hazard assessment to facilitate transparent, consistent, and documented decision-making and it discusses best-practices both for the organization of studies and presentation of data in a format suitable for regulatory review. First, it is acknowledged that the six WoE factors described in the addendum form an integrated network of evidence within a holistic assessment framework that is used synergistically to analyze and explain safety signals. Second, the proposed standardized procedure builds upon different considerations related to the primary sources of evidence, mechanistic analysis, alternative methodologies and novel investigative approaches, metabolites, and reliability of the data and other acquired information. Each of the six WoE factors is described highlighting how they can contribute evidence for the overall WoE assessment. A suggested reporting format to summarize the cross-integration of evidence from the different WoE factors is also presented. This work also notes that even if a 2-year rat study is ultimately required, creating a WoE assessment is valuable in understanding the specific factors and levels of human carcinogenic risk better than have been identified previously with the 2-year rat bioassay alone.
Collapse
Affiliation(s)
| | | | - Douglas Keller
- Independent Consultant, Kennett Square, PA, United States
| | - Lisa Beilke
- Toxicology Solutions, Inc., Marana, AZ, United States
| | | | - Frank Bringezu
- Chemical and Preclinical Safety, Merck Healthcare KGaA, Darmstadt, Germany
| | - William J. Brock
- Brock Scientific Consulting, LLC, Hilton Head, SC, United States
| | | | | | | | | | | | - Douglas Fuhrer
- BioXcel Therapeutics, Inc., New Haven, CT, United States
| | | | - Jim Hartke
- Gilead Sciences, Inc., Foster City, CA, United States
| | | | | | - Eric McDuffie
- Neurocrine Bioscience, Inc., San Diego, CA, United States
| | | | | | | | - Doris Zane
- Gilead Sciences, Inc., Foster City, CA, United States
| | | |
Collapse
|
2
|
Tran TTV, Surya Wibowo A, Tayara H, Chong KT. Artificial Intelligence in Drug Toxicity Prediction: Recent Advances, Challenges, and Future Perspectives. J Chem Inf Model 2023; 63:2628-2643. [PMID: 37125780 DOI: 10.1021/acs.jcim.3c00200] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Toxicity prediction is a critical step in the drug discovery process that helps identify and prioritize compounds with the greatest potential for safe and effective use in humans, while also reducing the risk of costly late-stage failures. It is estimated that over 30% of drug candidates are discarded owing to toxicity. Recently, artificial intelligence (AI) has been used to improve drug toxicity prediction as it provides more accurate and efficient methods for identifying the potentially toxic effects of new compounds before they are tested in human clinical trials, thus saving time and money. In this review, we present an overview of recent advances in AI-based drug toxicity prediction, including the use of various machine learning algorithms and deep learning architectures, of six major toxicity properties and Tox21 assay end points. Additionally, we provide a list of public data sources and useful toxicity prediction tools for the research community and highlight the challenges that must be addressed to enhance model performance. Finally, we discuss future perspectives for AI-based drug toxicity prediction. This review can aid researchers in understanding toxicity prediction and pave the way for new methods of drug discovery.
Collapse
Affiliation(s)
- Thi Tuyet Van Tran
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea
- Faculty of Information Technology, An Giang University, Long Xuyen 880000, Vietnam
- Vietnam National University - Ho Chi Minh City, Ho Chi Minh 700000, Vietnam
| | - Agung Surya Wibowo
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea
- Department of Electrical Engineering, Telkom University, Bandung 40257, Indonesia
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Republic of Korea
| | - Kil To Chong
- Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Republic of Korea
| |
Collapse
|
3
|
Li T, Tong W, Roberts R, Liu Z, Thakkar S. DeepCarc: Deep Learning-Powered Carcinogenicity Prediction Using Model-Level Representation. Front Artif Intell 2021; 4:757780. [PMID: 34870186 PMCID: PMC8636933 DOI: 10.3389/frai.2021.757780] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Accepted: 10/27/2021] [Indexed: 12/16/2022] Open
Abstract
Carcinogenicity testing plays an essential role in identifying carcinogens in environmental chemistry and drug development. However, it is a time-consuming and label-intensive process to evaluate the carcinogenic potency with conventional 2-years rodent animal studies. Thus, there is an urgent need for alternative approaches to providing reliable and robust assessments on carcinogenicity. In this study, we proposed a DeepCarc model to predict carcinogenicity for small molecules using deep learning-based model-level representations. The DeepCarc Model was developed using a data set of 692 compounds and evaluated on a test set containing 171 compounds in the National Center for Toxicological Research liver cancer database (NCTRlcdb). As a result, the proposed DeepCarc model yielded a Matthews correlation coefficient (MCC) of 0.432 for the test set, outperforming four advanced deep learning (DL) powered quantitative structure-activity relationship (QSAR) models with an average improvement rate of 37%. Furthermore, the DeepCarc model was also employed to screen the carcinogenicity potential of the compounds from both DrugBank and Tox21. Altogether, the proposed DeepCarc model could serve as an early detection tool (https://github.com/TingLi2016/DeepCarc) for carcinogenicity assessment.
Collapse
Affiliation(s)
- Ting Li
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, United States,University of Arkansas at Little Rock and University of Arkansas for Medical Sciences Joint Bioinformatics Program, Little Rock, AR, United States
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, United States
| | - Ruth Roberts
- ApconiX Ltd., Alderley Edge, United Kingdom,Department of Biosciences, University of Birmingham, Birmingham, United Kingdom
| | - Zhichao Liu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, United States,*Correspondence: Zhichao Liu, ; Shraddha Thakkar,
| | - Shraddha Thakkar
- Office of Translational Sciences, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States,*Correspondence: Zhichao Liu, ; Shraddha Thakkar,
| |
Collapse
|
4
|
Venkateswaran MR, Vadivel TE, Jayabal S, Murugesan S, Rajasekaran S, Periyasamy S. A review on network pharmacology based phytotherapy in treating diabetes- An environmental perspective. ENVIRONMENTAL RESEARCH 2021; 202:111656. [PMID: 34265348 DOI: 10.1016/j.envres.2021.111656] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 06/19/2021] [Accepted: 07/04/2021] [Indexed: 06/13/2023]
Abstract
Diabetes has become common lifestyle disorder associated with obesity and cardiovascular diseases. Environmental factors like physical inactivity, polluted surroundings and unhealthy dieting also plays a vital role in diabetes pathogenesis. As the current anti-diabetic drugs possess unprecedented side effects, traditional herbal medicine can be used an alternative therapy. The paramount challenge with the herbal formulation usage is the lack of standardized procedure, entangled with little knowledge on drug safety and mechanism of drug action. Heavy metal contamination is a major environmental hazard where plants tend to accumulate toxic metals like nickel, chromium and lead through industrial and agricultural activities. It becomes inappropriate to use these plants for phytotherapy as it may affect the human health on long term consumption. This review discuss about the environmental risk factors related to diabetes and better implication of medicinal plants in anti-diabetic therapy using network pharmacology. It is an in silico analytical tool that helps to unravel the multi-targeted action of herbal formulations rich in secondary metabolites. Also, a special focus is attempted to pool the databases regarding the medicinal plants for diabetes and associated diseases, their bioactive compounds, possible diabetic targets, drug-target interaction and toxicology reports that may open an aisle in safer, effective and toxicity-free drug discovery.
Collapse
Affiliation(s)
- Meenakshi R Venkateswaran
- Department of Biotechnology, Anna University, BIT-Campus, Tiruchirappalli, 620024, Tamil Nadu, India
| | - Tamil Elakkiya Vadivel
- Department of Biotechnology, Anna University, BIT-Campus, Tiruchirappalli, 620024, Tamil Nadu, India
| | - Sasidharan Jayabal
- Department of Biotechnology, Anna University, BIT-Campus, Tiruchirappalli, 620024, Tamil Nadu, India
| | - Selvakumar Murugesan
- Department of Biotechnology, Anna University, BIT-Campus, Tiruchirappalli, 620024, Tamil Nadu, India
| | - Subbiah Rajasekaran
- Department of Biochemistry, ICMR-National Institute for Research in Environmental Health, Bhopal, India.
| | - Sureshkumar Periyasamy
- Department of Biotechnology, Anna University, BIT-Campus, Tiruchirappalli, 620024, Tamil Nadu, India.
| |
Collapse
|
5
|
Carrasquer CA, Malik N, States G, Qamar S, Cunningham S, Cunningham A. Chemical structure determines target organ carcinogenesis in rats. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2012; 23:775-795. [PMID: 23066888 PMCID: PMC3547634 DOI: 10.1080/1062936x.2012.728996] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
SAR models were developed for 12 rat tumour sites using data derived from the Carcinogenic Potency Database. Essentially, the models fall into two categories: Target Site Carcinogen-Non-Carcinogen (TSC-NC) and Target Site Carcinogen-Non-Target Site Carcinogen (TSC-NTSC). The TSC-NC models were composed of active chemicals that were carcinogenic to a specific target site and inactive ones that were whole animal non-carcinogens. On the other hand, the TSC-NTSC models used an inactive category also composed of carcinogens but to any/all other sites but the target site. Leave one out (LOO) validations produced an overall average concordance value for all 12 models of 0.77 for the TSC-NC models and 0.73 for the TSC-NTSC models. Overall, these findings suggest that while the TSC-NC models are able to distinguish between carcinogens and non-carcinogens, the TSC-NTSC models are identifying structural attributes that associate carcinogens to specific tumour sites. Since the TSC-NTSC models are composed of active and inactive compounds that are genotoxic and non-genotoxic carcinogens, the TSC-NTSC models may be capable of deciphering non-genotoxic mechanisms of carcinogenesis. Together, models of this type may also prove useful in anticancer drug development since they essentially contain chemical moieties that target a specific tumour site.
Collapse
Affiliation(s)
- C. A. Carrasquer
- James Graham Brown Cancer Center, University of Louisville, Louisville, KY 40202
| | - N. Malik
- James Graham Brown Cancer Center, University of Louisville, Louisville, KY 40202
| | - G. States
- James Graham Brown Cancer Center, University of Louisville, Louisville, KY 40202
| | - S. Qamar
- James Graham Brown Cancer Center, University of Louisville, Louisville, KY 40202
| | - S.L. Cunningham
- James Graham Brown Cancer Center, University of Louisville, Louisville, KY 40202
| | - A.R. Cunningham
- James Graham Brown Cancer Center, University of Louisville, Louisville, KY 40202
- Department of Medicine, University of Louisville, Louisville, KY 40202
- Department of Pharmacology and Toxicology, University of Louisville, Louisville, KY 40202
| |
Collapse
|
6
|
Thomas RS, Black MB, Li L, Healy E, Chu TM, Bao W, Andersen ME, Wolfinger RD. A comprehensive statistical analysis of predicting in vivo hazard using high-throughput in vitro screening. Toxicol Sci 2012; 128:398-417. [PMID: 22543276 DOI: 10.1093/toxsci/kfs159] [Citation(s) in RCA: 118] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Over the past 5 years, increased attention has been focused on using high-throughput in vitro screening for identifying chemical hazards and prioritizing chemicals for additional in vivo testing. The U.S. Environmental Protection Agency's ToxCast program has generated a significant amount of high-throughput screening data allowing a broad-based assessment of the utility of these assays for predicting in vivo responses. In this study, a comprehensive cross-validation model comparison was performed to evaluate the predictive performance of the more than 600 in vitro assays from the ToxCast phase I screening effort across 60 in vivo endpoints using 84 different statistical classification methods. The predictive performance of the in vitro assays was compared and combined with that from chemical structure descriptors. With the exception of chronic in vivo cholinesterase inhibition, the overall predictive power of both the in vitro assays and the chemical descriptors was relatively low. The predictive power of the in vitro assays was not significantly different from that of the chemical descriptors and aggregating the assays based on genes reduced predictive performance. Prefiltering the in vitro assay data outside the cross-validation loop, as done in some previous studies, significantly biased estimates of model performance. The results suggest that the current ToxCast phase I assays and chemicals have limited applicability for predicting in vivo chemical hazards using standard statistical classification methods. However, if viewed as a survey of potential molecular initiating events and interpreted as risk factors for toxicity, the assays may still be useful for chemical prioritization.
Collapse
Affiliation(s)
- Russell S Thomas
- The Hamner Institutes for Health Sciences Research Triangle Park, North Carolina 27709, USA.
| | | | | | | | | | | | | | | |
Collapse
|
7
|
Lin WJ, Chen JJ. Class-imbalanced classifiers for high-dimensional data. Brief Bioinform 2012; 14:13-26. [DOI: 10.1093/bib/bbs006] [Citation(s) in RCA: 178] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
|
8
|
Liu Z, Kelly R, Fang H, Ding D, Tong W. Comparative analysis of predictive models for nongenotoxic hepatocarcinogenicity using both toxicogenomics and quantitative structure-activity relationships. Chem Res Toxicol 2011; 24:1062-70. [PMID: 21627106 DOI: 10.1021/tx2000637] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The primary testing strategy to identify nongenotoxic carcinogens largely relies on the 2-year rodent bioassay, which is time-consuming and labor-intensive. There is an increasing effort to develop alternative approaches to prioritize the chemicals for, supplement, or even replace the cancer bioassay. In silico approaches based on quantitative structure-activity relationships (QSAR) are rapid and inexpensive and thus have been investigated for such purposes. A slightly more expensive approach based on short-term animal studies with toxicogenomics (TGx) represents another attractive option for this application. Thus, the primary questions are how much better predictive performance using short-term TGx models can be achieved compared to that of QSAR models, and what length of exposure is sufficient for high quality prediction based on TGx. In this study, we developed predictive models for rodent liver carcinogenicity using gene expression data generated from short-term animal models at different time points and QSAR. The study was focused on the prediction of nongenotoxic carcinogenicity since the genotoxic chemicals can be inexpensively removed from further development using various in vitro assays individually or in combination. We identified 62 chemicals whose hepatocarcinogenic potential was available from the National Center for Toxicological Research liver cancer database (NCTRlcdb). The gene expression profiles of liver tissue obtained from rats treated with these chemicals at different time points (1 day, 3 days, and 5 days) are available from the Gene Expression Omnibus (GEO) database. Both TGx and QSAR models were developed on the basis of the same set of chemicals using the same modeling approach, a nearest-centroid method with a minimum redundancy and maximum relevancy-based feature selection with performance assessed using compound-based 5-fold cross-validation. We found that the TGx models outperformed QSAR in every aspect of modeling. For example, the TGx models' predictive accuracy (0.77, 0.77, and 0.82 for the 1-day, 3-day, and 5-day models, respectively) was much higher for an independent validation set than that of a QSAR model (0.55). Permutation tests confirmed the statistical significance of the model's prediction performance. The study concluded that a short-term 5-day TGx animal model holds the potential to predict nongenotoxic hepatocarcinogenicity.
Collapse
Affiliation(s)
- Zhichao Liu
- Center of Excellence for Bioinformatics, National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR Road, Jefferson, Arkansas 72079, USA
| | | | | | | | | |
Collapse
|
9
|
Cunningham AR, Moss ST, Iype SA, Qian G, Qamar S, Cunningham SL. Structure-activity relationship analysis of rat mammary carcinogens. Chem Res Toxicol 2008; 21:1970-82. [PMID: 18759503 DOI: 10.1021/tx8001725] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Structure-activity relationship (SAR) models are powerful tools to investigate the mechanisms of action of chemical carcinogens and to predict the potential carcinogenicity of untested compounds. We describe here the application of the cat-SAR (categorical-SAR) program to two learning sets of rat mammary carcinogens. One set of developed models was based on a comparison of rat mammary carcinogens to rat noncarcinogens (MC-NC), and the second set compared rat mammary carcinogens to rat nonmammary carcinogens (MC-NMC). On the basis of a leave-one-out validation, the best rat MC-NC model achieved a concordance between experimental and predicted values of 84%, a sensitivity of 79%, and a specificity of 89%. Likewise, the best rat MC-MNC model achieved a concordance of 78%, a sensitivity of 82%, and a specificity of 74%. The MC-NMC model was based on a learning set that contained carcinogens in both the active (i.e., mammary carcinogens) and the inactive (i.e., carcinogens to sites other than the mammary gland) categories and was able to distinguish between these different types of carcinogens (i.e., tissue specific), not simply between carcinogens and noncarcinogens. On the basis of a structural comparison between this model and one for Salmonella mutagens, there was, as expected, a significant relationship between the two phenomena since a high proportion of breast carcinogens are Salmonella mutagens. However, when analyzing the specific structural features derived from the MC-NC learning set, a dichotomy was observed between fragments associated with mammary carcinogenesis and mutagenicity and others that were associated with estrogenic activity. Overall, these findings suggest that the MC-NC and MC-NMC models are able to identify structural attributes that may in part address the question of "why do some carcinogens cause breast cancer", which is a different question than "why do some chemicals cause cancer".
Collapse
Affiliation(s)
- Albert R Cunningham
- James Graham Brown Cancer Center, Department of Medicine, University of Louisville, 529 South Jackson Street, Louisville, Kentucky 40202, USA.
| | | | | | | | | | | |
Collapse
|
10
|
Young JF, Tsai CA, Chen JJ, Latendresse JR, Kodell RL. Database composition can affect the structure-activity relationship prediction. JOURNAL OF TOXICOLOGY AND ENVIRONMENTAL HEALTH. PART A 2006; 69:1527-40. [PMID: 16854783 DOI: 10.1080/15287390500468746] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
The percent active (A) and inactive (I) chemicals in a database can directly affect the sensitivity (% active chemicals predicted correctly) and specificity (% inactive chemicals predicted correctly) of structure-activity relationship (SAR) analyses. Subdividing the National Center for Toxicological Research (NCTR) liver cancer database (NCTRlcdb) into various A/I ratios, which varied from 0.2 to 5.5, resulted in sensitivity/specificity ratios that varied from 0.1 to 6.5. As percent active chemicals increased (increasing A/I ratio), the sensitivity rose, the specificity decreased, and the concordance (% total chemicals predicted correctly) remained fairly constant. The numbers of chemicals in the various data sets ranged from 187 to 999 and appeared to have no affect on any of the 3 predictors of sensitivity, specificity, or concordance.
Collapse
Affiliation(s)
- John F Young
- Division of Biometry and Risk Assessment, National Center for Toxicological Research, Food and Drug Administration, Jefferson, Arkansas 72079-9502, USA.
| | | | | | | | | |
Collapse
|
11
|
Chen JJ, Tsai CA, Moon H, Ahn H, Young JJ, Chen CH. Decision threshold adjustment in class prediction. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2006; 17:337-52. [PMID: 16815772 DOI: 10.1080/10659360600787700] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Standard classification algorithms are generally designed to maximize the number of correct predictions (concordance). The criterion of maximizing the concordance may not be appropriate in certain applications. In practice, some applications may emphasize high sensitivity (e.g., clinical diagnostic tests) and others may emphasize high specificity (e.g., epidemiology screening studies). This paper considers effects of the decision threshold on sensitivity, specificity, and concordance for four classification methods: logistic regression, classification tree, Fisher's linear discriminant analysis, and a weighted k-nearest neighbor. We investigated the use of decision threshold adjustment to improve performance of either sensitivity or specificity of a classifier under specific conditions. We conducted a Monte Carlo simulation showing that as the decision threshold increases, the sensitivity decreases and the specificity increases; but, the concordance values in an interval around the maximum concordance are similar. For specified sensitivity and specificity levels, an optimal decision threshold might be determined in an interval around the maximum concordance that meets the specified requirement. Three example data sets were analyzed for illustrations.
Collapse
Affiliation(s)
- J J Chen
- Division of Biometry and Risk Assessment, National Center for Toxicological Research, Food and Drug Administration, Jefferson, Arkansas 72079, USA.
| | | | | | | | | | | |
Collapse
|
12
|
Chen JJ, Tsai CA, Young JF, Kodell RL. Classification ensembles for unbalanced class sizes in predictive toxicology. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2005; 16:517-29. [PMID: 16428129 DOI: 10.1080/10659360500468468] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
This paper investigates the effects of the ratio of positive-to-negative samples on the sensitivity, specificity, and concordance. When the class sizes in the training samples are not equal, the classification rule derived will favor the majority class and result in a low sensitivity on the minority class prediction. We propose an ensemble classification approach to adjust for differential class sizes in a binary classifier system. An ensemble classifier consists of a set of base classifiers; its prediction rule is based on a summary measure of individual classifications by the base classifiers. Two re-sampling methods, augmentation and abatement, are proposed to generate different bootstrap samples of equal class size to build the base classifiers. The augmentation method balances the two class sizes by bootstrapping additional samples from the minority class, whereas the abatement method balances the two class sizes by sampling only a subset of samples from the majority class. The proposed procedure is applied to a data set to predict estrogen receptor binding activity and to a data set to predict animal liver carcinogenicity using SAR (structure-activity relationship) models as base classifiers. The abatement method appears to perform well in balancing sensitivity and specificity.
Collapse
Affiliation(s)
- J J Chen
- Division of Biometry and Risk Assessment, National Center for Toxicological Research, Food and Drug Administration, Jefferson, Arkansas 72079, USA.
| | | | | | | |
Collapse
|