1
|
Zietz M, Himmelstein DS, Kloster K, Williams C, Nagle MW, Greene CS. The probability of edge existence due to node degree: a baseline for network-based predictions. Gigascience 2024; 13:giae001. [PMID: 38323677 PMCID: PMC10848215 DOI: 10.1093/gigascience/giae001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 09/25/2023] [Accepted: 01/02/2024] [Indexed: 02/08/2024] Open
Abstract
Important tasks in biomedical discovery such as predicting gene functions, gene-disease associations, and drug repurposing opportunities are often framed as network edge prediction. The number of edges connecting to a node, termed degree, can vary greatly across nodes in real biomedical networks, and the distribution of degrees varies between networks. If degree strongly influences edge prediction, then imbalance or bias in the distribution of degrees could lead to nonspecific or misleading predictions. We introduce a network permutation framework to quantify the effects of node degree on edge prediction. Our framework decomposes performance into the proportions attributable to degree and the network's specific connections using network permutation to generate features that depend only on degree. We discover that performance attributable to factors other than degree is often only a small portion of overall performance. Researchers seeking to predict new or missing edges in biological networks should use our permutation approach to obtain a baseline for performance that may be nonspecific because of degree. We released our methods as an open-source Python package (https://github.com/hetio/xswap/).
Collapse
Affiliation(s)
- Michael Zietz
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Physics & Astronomy, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Daniel S Himmelstein
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Related Sciences, Denver, CO 80202, USA
| | - Kyle Kloster
- Carbon, Inc., Redwood City, CA 94063, USA
- Department of Computer Science, North Carolina State University, Raleigh, NC 27606, USA
| | - Christopher Williams
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Michael W Nagle
- Internal Medicine Research Unit, Pfizer Worldwide Research, Development, and Medical, Cambridge, MA 02139, USA
- Integrative Biology, Internal Medicine Research Unit, Worldwide Research, Development, and Medicine, Pfizer Inc., Cambridge, MA 02139, USA
- Human Biology Integration Foundation, Deep Human Biology Learning, Eisai Inc., Cambridge, MA 02140, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO 80045, USA
| |
Collapse
|