Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	[Subscribe] [Scholar Register]

Number

Cited by Other Article(s)

Saad MN, Hamed M. Transcriptome-Wide Association Study Reveals New Molecular Interactions Associated with Melanoma Pathogenesis. Cancers (Basel) 2024;16:2517. [PMID: 39061157 PMCID: PMC11274789 DOI: 10.3390/cancers16142517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 06/27/2024] [Accepted: 07/09/2024] [Indexed: 07/28/2024] Open

Moorjani P, Hellenthal G. Methods for Assessing Population Relationships and History Using Genomic Data. Annu Rev Genomics Hum Genet 2023;24:305-332. [PMID: 37220313 PMCID: PMC11040641 DOI: 10.1146/annurev-genom-111422-025117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]

Zhu H, Li T, Zhao B. Statistical Learning Methods for Neuroimaging Data Analysis with Applications. Annu Rev Biomed Data Sci 2023;6:73-104. [PMID: 37127052 DOI: 10.1146/annurev-biodatasci-020722-100353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]

O’Shea RJ, Tsoka S, Cook GJR, Goh V. Sparse Regression in Cancer Genomics: Comparing Variable Selection and Predictions in Real World Data. Cancer Inform 2021;20:11769351211056298. [PMID: 34866896 PMCID: PMC8640984 DOI: 10.1177/11769351211056298] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Accepted: 10/09/2021] [Indexed: 12/31/2022] Open

Abstract

BACKGROUND

Evaluation of gene interaction models in cancer genomics is challenging, as the true distribution is uncertain. Previous analyses have benchmarked models using synthetic data or databases of experimentally verified interactions - approaches which are susceptible to misrepresentation and incompleteness, respectively. The objectives of this analysis are to (1) provide a real-world data-driven approach for comparing performance of genomic model inference algorithms, (2) compare the performance of LASSO, elastic net, best-subset selection,L 0 L 1 penalisation andL 0 L 2 penalisation in real genomic data and (3) compare algorithmic preselection according to performance in our benchmark datasets to algorithmic selection by internal cross-validation.

METHODS

Five large ( n 4000 ) genomic datasets were extracted from Gene Expression Omnibus. 'Gold-standard' regression models were trained on subspaces of these datasets ( n 4000 , p = 500 ). Penalised regression models were trained on small samples from these subspaces ( n ∈ { 25 , 75 , 150 } , p = 500 ) and validated against the gold-standard models. Variable selection performance and out-of-sample prediction were assessed. Penalty 'preselection' according to test performance in the other 4 datasets was compared to selection internal cross-validation error minimisation.

RESULTS

L 1 L 2 -penalisation achieved the highest cosine similarity between estimated coefficients and those of gold-standard models.L 0 L 2 -penalised models explained the greatest proportion of variance in test responses, though performance was unreliable in low signal:noise conditions.L 0 L 2 also attained the highest overall median variable selection F1 score. Penalty preselection significantly outperformed selection by internal cross-validation in each of 3 examined metrics.

CONCLUSIONS

This analysis explores a novel approach for comparisons of model selection approaches in real genomic data from 5 cancers. Our benchmarking datasets have been made publicly available for use in future research. Our findings support the use ofL 0 L 2 penalisation for structural selection andL 1 L 2 penalisation for coefficient recovery in genomic data. Evaluation of learning algorithms according to observed test performance in external genomic datasets yields valuable insights into actual test performance, providing a data-driven complement to internal cross-validation in genomic regression tasks.

Collapse