Li C, Colosi LM. Improving the usefulness of molecular similarity-based chemical prioritization strategies.
SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2013;
24:679-694. [PMID:
23711116 DOI:
10.1080/1062936x.2013.792876]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Quantitative molecular similarity analysis (QMSA) is a seemingly useful tool for estimating environmental properties for the hundreds of emerging contaminants that have not yet been fully evaluated. Moreover, calibrated QMSA models are also useful for prioritizing research among currently unmeasured chemicals of interest. Previous work has demonstrated that prioritization based on molecular 'representativeness', as parameterized using summed Euclidean distances in n dimensions corresponding to n molecular descriptors, improves the prediction accuracy of QMSA models compared to random selection of compounds to be measured. In this study, we use two datasets of environmental parameters (i.e. in vitro oestrogenicity and sorption distribution coefficient Kd ) to demonstrate that maximizing representativeness alone cannot deliver optimal improvement in prediction accuracy if many of the chemicals that have already been measured are themselves highly representative. Thus, proper QMSA-based prioritization among unmeasured chemicals constitutes a balance between maximizing representativeness and minimizing redundancy. It is demonstrated that redundancy considerations are especially critical for highly heterogeneous datasets, and some discussion about achieving a proper balance between the two prioritization criteria is presented.
Collapse