1
|
Dutschmann TM, Schlenker V, Baumann K. Chemoinformatic regression methods and their applicability domain. Mol Inform 2024; 43:e202400018. [PMID: 38803302 DOI: 10.1002/minf.202400018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 03/24/2024] [Accepted: 03/25/2024] [Indexed: 05/29/2024]
Abstract
The growing interest in chemoinformatic model uncertainty calls for a summary of the most widely used regression techniques and how to estimate their reliability. Regression models learn a mapping from the space of explanatory variables to the space of continuous output values. Among other limitations, the predictive performance of the model is restricted by the training data used for model fitting. Identification of unusual objects by outlier detection methods can improve model performance. Additionally, proper model evaluation necessitates defining the limitations of the model, often called the applicability domain. Comparable to certain classifiers, some regression techniques come with built-in methods or augmentations to quantify their (un)certainty, while others rely on generic procedures. The theoretical background of their working principles and how to deduce specific and general definitions for their domain of applicability shall be explained.
Collapse
Affiliation(s)
- Thomas-Martin Dutschmann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106, Braunschweig, Germany
| | - Valerie Schlenker
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106, Braunschweig, Germany
| | - Knut Baumann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106, Braunschweig, Germany
| |
Collapse
|
2
|
Zhuang W, Ben X, Zhou Z, Ding Y, Tang Y, Huang S, Deng C, Liao Y, Zhou Q, Zhao J, Wang G, Xu Y, Wen X, Zhang Y, Cai S, Chen R, Qiao G. Identification of a Ten-Gene Signature of DNA Damage Response Pathways with Prognostic Value in Esophageal Squamous Cell Carcinoma. JOURNAL OF ONCOLOGY 2021; 2021:3726058. [PMID: 34976055 PMCID: PMC8716225 DOI: 10.1155/2021/3726058] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 11/27/2021] [Indexed: 02/06/2023]
Abstract
Molecular prognostic signatures are critical for treatment decision-making in esophageal squamous cell cancer (ESCC), but the robustness of these signatures is limited. The aberrant DNA damage response (DDR) pathway may lead to the accumulation of mutations and thus accelerate tumor progression in ESCC. Given this, we applied the LASSO Cox regression to the transcriptomic data of DDR genes, and a prognostic DDR-related gene expression signature (DRGS) consisting of ten genes was constructed, including PARP3, POLB, XRCC5, MLH1, DMC1, GTF2H3, PER1, SMC5, TCEA1, and HERC2. The DRGS was independently associated with overall survival in both training and validation cohorts. The DRGS achieved higher accuracy than six previously reported multigene signatures for the prediction of prognosis in comparable cohorts. Furtherly, a nomogram incorporating DRGS and clinicopathological features showed improved predicting performance. Taken together, the DRGS was identified as a novel, robust, and effective prognostic indicator, which may refine the scheme of risk stratification and management in ESCC patients.
Collapse
Affiliation(s)
- Weitao Zhuang
- Department of Thoracic Surgery, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China
- Shantou University Medical College, Shantou 515041, China
| | - Xiaosong Ben
- Department of Thoracic Surgery, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China
| | - Zihao Zhou
- Department of Thoracic Surgery, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China
| | - Yu Ding
- Department of Thoracic Surgery, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China
- The Second School of Clinical Medicine, Southern Medical University, Guangzhou 510515, China
| | - Yong Tang
- Department of Thoracic Surgery, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China
| | - Shujie Huang
- Department of Thoracic Surgery, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China
- Shantou University Medical College, Shantou 515041, China
| | - Cheng Deng
- Department of Thoracic Surgery, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China
| | - Yuchen Liao
- Burning Rock Biotech, Guangzhou 510300, China
| | | | - Jing Zhao
- Burning Rock Biotech, Guangzhou 510300, China
| | | | - Yu Xu
- Burning Rock Biotech, Guangzhou 510300, China
| | | | - Yuzi Zhang
- Burning Rock Biotech, Guangzhou 510300, China
| | - Shangli Cai
- Burning Rock Biotech, Guangzhou 510300, China
| | - Rixin Chen
- Department of Thoracic Surgery, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China
- Research Center of Medical Sciences, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China
| | - Guibin Qiao
- Department of Thoracic Surgery, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China
| |
Collapse
|
3
|
Sanchez-Cruz A, Robledo N, Rosete-Enríquez M, Romero-López AA. Attraction of Adults of Cyclocephala lunulata and Cyclocephala barrerai (Coleoptera: Scarabaeoidea: Melolonthidae) towards Bacteria Volatiles Isolated from Their Genital Chambers. MOLECULES (BASEL, SWITZERLAND) 2020; 25:molecules25194430. [PMID: 32992458 PMCID: PMC7582287 DOI: 10.3390/molecules25194430] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 09/21/2020] [Accepted: 09/24/2020] [Indexed: 02/01/2023]
Abstract
In the study of the chemical communication of adults of the Melolonthidae family, bacteria have been observed in the epithelium of the genital chamber; possibly, bacteria are involved in the production of sex attractants in their hosts. Therefore, it is important to identify the volatile organic compounds from bacteria (VOCsB) released by these microorganisms and study the biological activity stimulated by VOBCs in adults of Melolonthidae. In this study, bacteria were isolated from the genital chamber of Cyclocephala lunulata and Cyclocephala barrerai, from which VOCsB were extracted using static headspace solid-phase microextraction (SHS-SPME) and dynamic headspace Super Q solid-phase extraction (DHS-SPE) and analyzed using gas chromatography-mass spectrometry. The effect of VOCsB on the hosts and conspecifics was evaluated utilizing an olfactometer and electroantennography (EAG). Two species of Enterobacteria were isolated from the genital chamber of each female species, and VOCsB derived from sulfur-containing compounds, alcohols, esters, and fatty acids were identified. An attraction response was observed in olfactometry studies, and antennal responses to VOCsB were confirmed in EAG bioassays. With these results, new perspectives on the relationship between these beetles and their bacteria emerge, in addition to establishing a basis for management programs in the future.
Collapse
Affiliation(s)
- Abraham Sanchez-Cruz
- Laboratorio de Ecología Química de Insectos, Centro de Desarrollo de Productos Bióticos, Instituto Politécnico Nacional. Carretera Yautepec-Jojutla, Km. 6, calle CEPROBI No. 8, Col. San Isidro, Yautepec, Morelos C.P. 62731, Mexico;
| | - Norma Robledo
- Laboratorio de Ecología Química de Insectos, Centro de Desarrollo de Productos Bióticos, Instituto Politécnico Nacional. Carretera Yautepec-Jojutla, Km. 6, calle CEPROBI No. 8, Col. San Isidro, Yautepec, Morelos C.P. 62731, Mexico;
- Correspondence: (N.R.); (A.A.R.-L.); Tel.: +52-(735)-3942020 (N.R.); +52-(735)-3941896 (N.R.); +52-(222)-2295500 (A.A.R.-L.)
| | - María Rosete-Enríquez
- Laboratorio de Macromoléculas, Facultad de Ciencias Biológicas, Benemérita Universidad Autónoma de Puebla, Boulevard Capitán Carlos Camacho Espíritu, Edificio 112-A, Ciudad Universitaria, Col. Jardines de San Manuel, Puebla C. P. 72570, Mexico;
| | - Angel A. Romero-López
- Laboratorio de Infoquímicos y Otros compuestos Bióticos, Facultad de Ciencias Biológicas, Benemérita Universidad Autónoma de Puebla, Boulevard Capitán Carlos Camacho Espíritu, Edificio 112-A, Ciudad Universitaria, Col. Jardines de San Manuel, Puebla C. P. 72570, Mexico
- Correspondence: (N.R.); (A.A.R.-L.); Tel.: +52-(735)-3942020 (N.R.); +52-(735)-3941896 (N.R.); +52-(222)-2295500 (A.A.R.-L.)
| |
Collapse
|
4
|
Zhou T, Cai Z, Ma N, Xie W, Gao C, Huang M, Bai Y, Ni Y, Tang Y. A Novel Ten-Gene Signature Predicting Prognosis in Hepatocellular Carcinoma. Front Cell Dev Biol 2020; 8:629. [PMID: 32760725 PMCID: PMC7372135 DOI: 10.3389/fcell.2020.00629] [Citation(s) in RCA: 54] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 06/23/2020] [Indexed: 01/27/2023] Open
Abstract
Hepatocellular carcinoma (HCC) has a dismal long-term outcome. We aimed to construct a multi-gene model for prognosis prediction to inform HCC management. The cancer-specific differentially expressed genes (DEGs) were identified using RNA-seq data of paired tumor and normal tissue. A prognostic signature was built by LASSO regression analysis. Gene set enrichment analysis (GSEA) was performed to further understand the underlying molecular mechanisms. A 10-gene signature was constructed to stratify the TCGA and ICGC cohorts into high- and low-risk groups where prognosis was significantly worse in the high-risk group across cohorts (P < 0.001 for all). The 10-gene signature outperformed all previously reported models for both C-index and the AUCs for 1-, 3-, 5-year survival prediction (C-index, 0.84 vs 0.67 to 0.73; AUCs for 1-, 3- and 5-year OS, 0.84 vs 0.68 to 0.79, 0.81 to 0.68 to 0.80, and 0.85 vs 0.67 to 0.78, respectively). Multivariate Cox regression analysis revealed risk group and tumor stage to be independent predictors of survival in HCC. A nomogram incorporating tumor stage and signature-based risk group showed better performance for 1- and 3-year survival than for 5-year survival. GSEA revealed enrichment of pathways related to cell cycle regulation among high-risk samples and metabolic processes in the low-risk group. Our 10-gene model is robust for prognosis prediction and may help inform clinical management of HCC.
Collapse
Affiliation(s)
- Taicheng Zhou
- Department of Gastroenterological Surgery and Hernia Center, The Sixth Affiliated Hospital of Sun Yat-sen University, Guangdong Institute of Gastroenterology, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, Supported by National Key Clinical Discipline, Guangzhou, China
| | - Zhihua Cai
- Department of Oncology, The Affiliated Cancer Hospital of Guangzhou Medical University, Guangzhou, China
| | - Ning Ma
- Department of Gastroenterological Surgery and Hernia Center, The Sixth Affiliated Hospital of Sun Yat-sen University, Guangdong Institute of Gastroenterology, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, Supported by National Key Clinical Discipline, Guangzhou, China
| | - Wenzhuan Xie
- The Medical Department, 3D Medicines Inc., Shanghai, China
| | - Chan Gao
- The Medical Department, 3D Medicines Inc., Shanghai, China
| | - Mengli Huang
- The Medical Department, 3D Medicines Inc., Shanghai, China
| | - Yuezong Bai
- The Medical Department, 3D Medicines Inc., Shanghai, China
| | - Yangpeng Ni
- Department of Oncology, Jieyang People's Hospital, Sun Yat-sen University, Jieyang, China
| | - Yunqiang Tang
- Department of Hepatic-Biliary Surgery, The Affiliated Cancer Hospital of Guangzhou Medical University, Guangzhou, China
| |
Collapse
|
5
|
Qin L, Sun Q, Wang Y, Wu KF, Chen M, Shia BC, Wu SY. Prediction of Number of Cases of 2019 Novel Coronavirus (COVID-19) Using Social Media Search Index. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17072365. [PMID: 32244425 PMCID: PMC7177617 DOI: 10.3390/ijerph17072365] [Citation(s) in RCA: 86] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 03/29/2020] [Accepted: 03/30/2020] [Indexed: 01/02/2023]
Abstract
Predicting the number of new suspected or confirmed cases of novel coronavirus disease 2019 (COVID-19) is crucial in the prevention and control of the COVID-19 outbreak. Social media search indexes (SMSI) for dry cough, fever, chest distress, coronavirus, and pneumonia were collected from 31 December 2019 to 9 February 2020. The new suspected cases of COVID-19 data were collected from 20 January 2020 to 9 February 2020. We used the lagged series of SMSI to predict new suspected COVID-19 case numbers during this period. To avoid overfitting, five methods, namely subset selection, forward selection, lasso regression, ridge regression, and elastic net, were used to estimate coefficients. We selected the optimal method to predict new suspected COVID-19 case numbers from 20 January 2020 to 9 February 2020. We further validated the optimal method for new confirmed cases of COVID-19 from 31 December 2019 to 17 February 2020. The new suspected COVID-19 case numbers correlated significantly with the lagged series of SMSI. SMSI could be detected 6–9 days earlier than new suspected cases of COVID-19. The optimal method was the subset selection method, which had the lowest estimation error and a moderate number of predictors. The subset selection method also significantly correlated with the new confirmed COVID-19 cases after validation. SMSI findings on lag day 10 were significantly correlated with new confirmed COVID-19 cases. SMSI could be a significant predictor of the number of COVID-19 infections. SMSI could be an effective early predictor, which would enable governments’ health departments to locate potential and high-risk outbreak areas.
Collapse
Affiliation(s)
- Lei Qin
- School of Statistics, University of International Business and Economics, Beijing 100029, China; (L.Q.); (Q.S.); (Y.W.)
| | - Qiang Sun
- School of Statistics, University of International Business and Economics, Beijing 100029, China; (L.Q.); (Q.S.); (Y.W.)
| | - Yidan Wang
- School of Statistics, University of International Business and Economics, Beijing 100029, China; (L.Q.); (Q.S.); (Y.W.)
| | - Ke-Fei Wu
- Graduate Institute of Business Administration, College of Management, Fu Jen Catholic University, New Taipei City 242, Taiwan; (K.-F.W.); (M.C.)
| | - Mingchih Chen
- Graduate Institute of Business Administration, College of Management, Fu Jen Catholic University, New Taipei City 242, Taiwan; (K.-F.W.); (M.C.)
| | - Ben-Chang Shia
- Research Center of Big Data, College of management, Taipei Medical University, Taipei 110, Taiwan;
- College of Management, Taipei Medical University, Taipei 110, Taiwan
- Executive Master Program of Business Administration in Biotechnology, College of management, Taipei Medical University, Taipei 110, Taiwan
| | - Szu-Yuan Wu
- Department of Food Nutrition and Health Biotechnology, College of Medical and Health Science, Asia University, Taichung 41354, Taiwan
- Division of Radiation Oncology, Lo-Hsu Medical Foundation, Lotung Poh-Ai Hospital, Yilan 265, Taiwan
- Big Data Center, Lo-Hsu Medical Foundation, Lotung Poh-Ai Hospital, Yilan 265, Taiwan
- Department of Healthcare Administration, College of Medical and Health Science, Asia University, Taichung 41354, Taiwan
- School of Dentistry, College of Oral Medicine, Taipei Medical University, Taipei 110, Taiwan
- Correspondence:
| |
Collapse
|
6
|
Impact of Urban Surface Characteristics and Socio-Economic Variables on the Spatial Variation of Land Surface Temperature in Lagos City, Nigeria. SUSTAINABILITY 2018. [DOI: 10.3390/su11010025] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The urban heat island (UHI) and its consequences have become a key research focus of various disciplines because of its negative externalities on urban ecology and the total livability of cities. Identifying spatial variation of the land surface temperature (LST) provides a clear picture to understand the UHI phenomenon, and it will help to introduce appropriate mitigation technique to address the advanced impact of UHI. Hence, the aim of the research is to examine the spatial variation of LST concerning the UHI phenomenon in rapidly urbanizing Lagos City. Four variables were examined to identify the impact of urban surface characteristics and socio-economic activities on LST. The gradient analysis was employed to assess the distribution outline of LST from the city center point to rural areas over the vegetation and built-up areas. Partial least square (PLS) regression analysis was used to assess the correlation and statistically significance of the variables. Landsat data captured in 2002 and 2013 were used as primary data sources and other gridded data, such as PD and FFCOE, were employed. The results of the analyses show that the distribution pattern of the LST in 2002 and 2013 has changed over the study period as results of changing urban surface characteristics (USC) and the influence of socio-economic activities. LST has a strong positive relationship with NDBI and a strong negative relationship with NDVI. The rapid development of Lagos City has been directly affected by conversion more green areas to build up areas over the time, and it has resulted in formulating more surface urban heat island (SUHI). Further, the increasing population and their socio-economic activities including industrialization and infrastructure development have also caused a significant impact on LST changes. We recommend that the results of this research be used as a proxy tool to introduce appropriate landscape and town planning in a sustainable viewpoint to make healthier and livable urban environments in Lagos City, Nigeria
Collapse
|
7
|
Rácz A, Bajusz D, Héberger K. Modelling methods and cross-validation variants in QSAR: a multi-level analysis $. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2018; 29:661-674. [PMID: 30160175 DOI: 10.1080/1062936x.2018.1505778] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2018] [Accepted: 07/24/2018] [Indexed: 06/08/2023]
Abstract
Prediction performance often depends on the cross- and test validation protocols applied. Several combinations of different cross-validation variants and model-building techniques were used to reveal their complexity. Two case studies (acute toxicity data) were examined, applying five-fold cross-validation (with random, contiguous and Venetian blind forms) and leave-one-out cross-validation (CV). External test sets showed the effects and differences between the validation protocols. The models were generated with multiple linear regression (MLR), principal component regression (PCR), partial least squares (PLS) regression, artificial neural networks (ANN) and support vector machines (SVM). The comparisons were made by the sum of ranking differences (SRD) and factorial analysis of variance (ANOVA). The largest bias and variance could be assigned to the MLR method and contiguous block cross-validation. SRD can provide a unique and unambiguous ranking of methods and CV variants. Venetian blind cross-validation is a promising tool. The generated models were also compared based on their basic performance parameters (r2 and Q2). MLR produced the largest gap, while PCR gave the smallest. Although PCR is the best validated and balanced technique, SVM always outperformed the other methods, when experimental values were the benchmark. Variable selection was advantageous, and the modelling had a larger influence than CV variants.
Collapse
Affiliation(s)
- A Rácz
- a Plasma Chemistry Research Group , Research Centre for Natural Sciences, Hungarian Academy of Sciences , Budapest, Hungary
| | - D Bajusz
- b Medicinal Chemistry Research Group , Research Centre for Natural Sciences, Hungarian Academy of Sciences , Budapest, Hungary
| | - K Héberger
- a Plasma Chemistry Research Group , Research Centre for Natural Sciences, Hungarian Academy of Sciences , Budapest, Hungary
| |
Collapse
|
8
|
Singer M, Krivobokova T, Munk A, de Groot B. Partial least squares for dependent data. Biometrika 2016; 103:351-362. [DOI: 10.1093/biomet/asw010] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
9
|
Daghir-Wojtkowiak E, Wiczling P, Bocian S, Kubik Ł, Kośliński P, Buszewski B, Kaliszan R, Markuszewski MJ. Least absolute shrinkage and selection operator and dimensionality reduction techniques in quantitative structure retention relationship modeling of retention in hydrophilic interaction liquid chromatography. J Chromatogr A 2015; 1403:54-62. [DOI: 10.1016/j.chroma.2015.05.025] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Revised: 05/07/2015] [Accepted: 05/11/2015] [Indexed: 12/16/2022]
|
10
|
Varnek A, Baskin I. Machine learning methods for property prediction in chemoinformatics: Quo Vadis? J Chem Inf Model 2012; 52:1413-37. [PMID: 22582859 DOI: 10.1021/ci200409x] [Citation(s) in RCA: 148] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
This paper is focused on modern approaches to machine learning, most of which are as yet used infrequently or not at all in chemoinformatics. Machine learning methods are characterized in terms of the "modes of statistical inference" and "modeling levels" nomenclature and by considering different facets of the modeling with respect to input/ouput matching, data types, models duality, and models inference. Particular attention is paid to new approaches and concepts that may provide efficient solutions of common problems in chemoinformatics: improvement of predictive performance of structure-property (activity) models, generation of structures possessing desirable properties, model applicability domain, modeling of properties with functional endpoints (e.g., phase diagrams and dose-response curves), and accounting for multiple molecular species (e.g., conformers or tautomers).
Collapse
Affiliation(s)
- Alexandre Varnek
- Laboratoire d'Infochimie, UMR 7177 CNRS, Université de Strasbourg, 4, rue B. Pascal, Strasbourg 67000, France.
| | | |
Collapse
|
11
|
Fatemi MH, Elyasi M. Prediction of gas chromatographic retention indices of some amino acids and carboxylic acids from their structural descriptors. J Sep Sci 2011; 34:3216-20. [DOI: 10.1002/jssc.201100544] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2011] [Revised: 08/24/2011] [Accepted: 08/26/2011] [Indexed: 11/10/2022]
|
12
|
Jalali-Heravi M, Ebrahimi-Najafabadi H. Modeling of retention behaviors of most frequent components of essential oils in polar and non-polar stationary phases. J Sep Sci 2011; 34:1538-46. [DOI: 10.1002/jssc.201100042] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2011] [Revised: 03/26/2011] [Accepted: 04/10/2011] [Indexed: 11/09/2022]
|
13
|
Abstract
Characterizing dynamic gene expression pattern and predicting patient outcome is now significant and will be of more interest in the future with large scale clinical investigation of microarrays. However, there is currently no method that has been developed for prediction of patient outcome using longitudinal gene expression, where gene expression of patients is being monitored across time. Here, we propose a novel prediction approach for patient survival time that makes use of time course structure of gene expression. This method is applied to a burn study. The genes involved in the final predictors are enriched in the inflammatory response and immune system related pathways. Moreover, our method is consistently better than prediction methods using individual time point gene expression or simply pooling gene expression from each time point.
Collapse
|
14
|
Ghavami R, Faham S. QSRR Models for Kováts' Retention Indices of a Variety of Volatile Organic Compounds on Polar and Apolar GC Stationary Phases Using Molecular Connectivity Indexes. Chromatographia 2010; 72:893-903. [PMID: 21088689 PMCID: PMC2965364 DOI: 10.1365/s10337-010-1741-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2010] [Revised: 07/13/2010] [Accepted: 08/02/2010] [Indexed: 11/29/2022]
Abstract
Quantitative structure-retention relationship (QSRR) approaches, based on molecular connectivity indices are useful to predict the gas chromatography of Kováts relative retention indices (GC-RRIs) of 132 volatile organic compounds (VOCs) on different 12 (4 apolar and 8 polar) stationary phases (C67, C103, C78, C∞, POH, TTF, MTF, PCL, PBR, TMO, PSH and PCN) at 130 °C. Full geometry optimization based on Austin model 1 semi-empirical molecular orbital method was carried out. The sets of 30 molecular descriptors were derived directly from the topological structures of the compounds from DRAGON program. By means of the final variable selection method, which is elimination selection stepwise regression algorithms, three optimal descriptors were selected to develop a QSRR model to predict the RRI of organic compounds on each stationary phase with a correlation coefficient between 0.9378 and 0.9673 and a leave-one-out cross-validation correlation coefficient between 0.9325 and 0.9653. The root mean squares errors over different 12 phases were within the range of 0.0333–0.0458. Furthermore, the accuracy of all developed models was confirmed using procedures of Y-randomization, external validation through an odd–even number and division of the entire dataset into training and test sets. A successful interpretation of the complex relationship between GC RRIs of VOCs and the chemical structures was achieved by QSRR. The three connectivity indexes in the models are also rationally interpreted, which indicated that all organic compounds’ RRI was precisely represented by molecular connectivity indexes.
Collapse
Affiliation(s)
- Raouf Ghavami
- Department of Chemistry, Faculty of Science, University of Kurdistan, P.O. Box 416, Sanandaj, Iran
| | | |
Collapse
|
15
|
Rykowska I, Bielecki P, Wasiak W. Retention indices and quantum-chemical descriptors of aromatic compounds on stationary phases with chemically bonded copper complexes. J Chromatogr A 2010; 1217:1971-6. [DOI: 10.1016/j.chroma.2010.01.073] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2009] [Revised: 01/15/2010] [Accepted: 01/22/2010] [Indexed: 11/30/2022]
|
16
|
|
17
|
Predicting liquid chromatographic retention times of peptides from the Drosophila melanogaster proteome by machine learning approaches. Anal Chim Acta 2009; 644:10-6. [DOI: 10.1016/j.aca.2009.04.010] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2008] [Revised: 03/29/2009] [Accepted: 04/07/2009] [Indexed: 11/22/2022]
|
18
|
Farkas O, Zenkevich IG, Stout F, Kalivas JH, Héberger K. Prediction of retention indices for identification of fatty acid methyl esters. J Chromatogr A 2008; 1198-1199:188-95. [DOI: 10.1016/j.chroma.2008.05.019] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2008] [Revised: 05/06/2008] [Accepted: 05/08/2008] [Indexed: 10/22/2022]
|
19
|
Luan F, Liu HT, Wen Y, Zhang X. Quantitative structure-property relationship study for estimation of quantitative calibration factors of some organic compounds in gas chromatography. Anal Chim Acta 2008; 612:126-35. [PMID: 18358857 DOI: 10.1016/j.aca.2008.02.037] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2007] [Revised: 02/20/2008] [Accepted: 02/21/2008] [Indexed: 11/18/2022]
Abstract
Quantitative structure-property relationship (QSPR) models have been used to predict and explain gas chromatographic data of quantitative calibration factors (f(M)). This method allows for the prediction of quantitative calibration factors in a variety of organic compounds based on their structures alone. Stepwise multiple linear regression (MLR) and non-linear radial basis function neural network (RBFNN) were performed to build the models. The statistical characteristics provided by multiple linear model (R2=0.927, RMS=0.073; AARD=6.34% for test set) indicated satisfactory stability and predictive ability, while the predictive ability of RBFNN model is somewhat superior (R2=0.959; RMS=0.0648; AARD=4.85% for test set). This QSPR approach can contribute to a better understanding of structural factors of the compounds responsible for quantitative analysis by gas chromatography, and can be useful in predicting the quantitative calibration factors of other compounds.
Collapse
Affiliation(s)
- Feng Luan
- Department of Applied Chemistry, Yantai University, Yantai 264005, PR China.
| | | | | | | |
Collapse
|
20
|
Affiliation(s)
- Roman Kaliszan
- Department of Biopharmaceutics and Pharmacodynamics, Medical University of Gdańsk, Gen. J. Hallera 107, 80416 Gdańsk, Poland.
| |
Collapse
|
21
|
Zhu XH, Wang W, Schramm KW, Niu W. Prediction of the Kováts Retention Indices of Thiols by Use of Quantum Chemical and Physicochemical Descriptors. Chromatographia 2007. [DOI: 10.1365/s10337-007-0237-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
22
|
Héberger K. Quantitative structure-(chromatographic) retention relationships. J Chromatogr A 2007; 1158:273-305. [PMID: 17499256 DOI: 10.1016/j.chroma.2007.03.108] [Citation(s) in RCA: 268] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2007] [Revised: 03/13/2007] [Accepted: 03/19/2007] [Indexed: 01/30/2023]
Abstract
Since the pioneering works of Kaliszan (R. Kaliszan, Quantitative Structure-Chromatographic Retention Relationships, Wiley, New York, 1987; and R. Kaliszan, Structure and Retention in Chromatography. A Chemometric Approach, Harwood Academic, Amsterdam, 1997) no comprehensive summary is available in the field. Present review covers the period of 1996-August 2006. The sources are grouped according to the special properties of kinds of chromatography: Quantitative structure-retention relationship in gas chromatography, in planar chromatography, in column liquid chromatography, in micellar liquid chromatography, affinity chromatography and quantitative structure enantioselective retention relationships. General tendencies, misleading practice and conclusions, validation of the models, suggestions for future works are summarized for each sub-field. Some straightforward applications are emphasized but standard ones. The sources and the model compounds, descriptors, predicted retention data, modeling methods and indicators of their performance, validation of models, and stationary phases are collected in the tables. Some important conclusions are: Not all physicochemical descriptors correlate with the retention data strongly; the heat of formation is not related to the chromatographic retention. It is not appropriate to give the errors of Kovats indices in percentages. The apparently low values (1-3%) can disorient the reviewers and readers. Contemporary mean interlaboratory reproducibility of Kovats indices are about 5-10 i.u. for standard non polar phases and 10-25 i.u. for standard polar phases. The predictive performance of QSRR models deteriorates as the polarity of GC stationary phase increases. The correlation coefficient alone is not a particularly good indicator for the model performance. Residuals are more useful than plots of measured and calculated values. There is no need to give the retention data in a form of an equation if the numbers of compounds are small. The domain of model applicability of models should be given in all cases.
Collapse
Affiliation(s)
- Károly Héberger
- Chemical Research Center, Hungarian Academy of Sciences, P.O. Box 17, H-1525 Budapest, Hungary.
| |
Collapse
|
23
|
Tulasamma P, Reddy KS. Quantitative structure and retention relationships for gas chromatographic data: Application to alkyl pyridines on apolar and polar phases. J Mol Graph Model 2006; 25:507-13. [PMID: 16713723 DOI: 10.1016/j.jmgm.2006.04.003] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2005] [Revised: 04/01/2006] [Accepted: 04/03/2006] [Indexed: 11/19/2022]
Abstract
Quantitative structure and retention relationships (QSRR) have been developed to model gas chromatographic retention data of alkyl pyridines on apolar (branched alkane) and polar (primary alcohol) stationary phases. The retention properties analyzed are Kovats retention index, I; partial molar enthalpy, DeltaH; partial molar entropy, DeltaS and partition coefficient, log K. Using the seven valence molecular connectivity indices (chi) calculated for the 18 alkyl pyridines, regression models are generated to predict the retention properties. The best model (model A) obtained with the descriptors ((1)chi(P)(V), (3)chi(P)(V) and (6)chi(CH)(V) was unable to produce a satisfactory statistical performance and correct order of elution. The model has been modified (model B) by including steric parameter s, which has been empirically derived by considering the steric effects due to the presence of alkyl groups at the ortho and meta positions. The modified model predicts the correct order of elution for all the alkyl pyridines and good correlation coefficients, r. The r values obtained for I, DeltaH, DeltaS and log K are: r=0.955, 0.975, 0.984 and 0.955 (model A) become r=0.999, 0.996, 0.990 and 0.999 (model B) on the apolar stationary phase and r=0.931, 0.926, 0.911 and 0.931 (model A) become r=0.999, 0.992, 0.958 and 0.999 (model B) on the polar stationary phase.
Collapse
Affiliation(s)
- Palagiri Tulasamma
- Department of Chemistry, Sri Venkateswara University, Tirupati 517502, India
| | | |
Collapse
|
24
|
|
25
|
Abstract
We propose a new classification method for the prediction of drug properties, called random feature subset boosting for linear discriminant analysis (LDA). The main novelty of this method is the ability to overcome the problems with constructing ensembles of linear discriminant models based on generalized eigenvectors of covariance matrices. Such linear models are popular in building classification-based structure-activity relationships. The introduction of ensembles of LDA models allows for an analysis of more complex problems than by using single LDA, for example, those involving multiple mechanisms of action. Using four data sets, we show experimentally that the method is competitive with other recently studied chemoinformatic methods, including support vector machines and models based on decision trees. We present an easy scheme for interpreting the model despite its apparent sophistication. We also outline theoretical evidence as to why, contrary to the conventional AdaBoost ensemble algorithm, this method is able to increase the accuracy of LDA models.
Collapse
Affiliation(s)
- Tomasz Arodź
- Institute of Computer Science, AGH University of Science and Technology, Kraków, Poland.
| | | | | |
Collapse
|