1
|
Shi W, Yang H, Xie L, Yin XX, Zhang Y. A review of machine learning-based methods for predicting drug-target interactions. Health Inf Sci Syst 2024; 12:30. [PMID: 38617016 PMCID: PMC11014838 DOI: 10.1007/s13755-024-00287-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 03/04/2024] [Indexed: 04/16/2024] Open
Abstract
The prediction of drug-target interactions (DTI) is a crucial preliminary stage in drug discovery and development, given the substantial risk of failure and the prolonged validation period associated with in vitro and in vivo experiments. In the contemporary landscape, various machine learning-based methods have emerged as indispensable tools for DTI prediction. This paper begins by placing emphasis on the data representation employed by these methods, delineating five representations for drugs and four for proteins. The methods are then categorized into traditional machine learning-based approaches and deep learning-based ones, with a discussion of representative approaches in each category and the introduction of a novel taxonomy for deep neural network models in DTI prediction. Additionally, we present a synthesis of commonly used datasets and evaluation metrics to facilitate practical implementation. In conclusion, we address current challenges and outline potential future directions in this research field.
Collapse
Affiliation(s)
- Wen Shi
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004 China
| | - Hong Yang
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
| | - Linhai Xie
- State Key Laboratory of Proteomics, National Center for Protein Sciences (Beijing), Beijing, 102206 China
| | - Xiao-Xia Yin
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
| | - Yanchun Zhang
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004 China
- Department of New Networks, Peng Cheng Laboratory, Shenzhen, 518000 China
| |
Collapse
|
2
|
Anthonimuthu DJ, Hejlesen O, Zwisler ADO, Udsen FW. Application of Machine Learning in Multimorbidity Research: Protocol for a Scoping Review. JMIR Res Protoc 2024; 13:e53761. [PMID: 38767948 PMCID: PMC11148516 DOI: 10.2196/53761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 03/15/2024] [Accepted: 04/02/2024] [Indexed: 05/22/2024] Open
Abstract
BACKGROUND Multimorbidity, defined as the coexistence of multiple chronic conditions, poses significant challenges to health care systems on a global scale. It is associated with increased mortality, reduced quality of life, and increased health care costs. The burden of multimorbidity is expected to worsen if no effective intervention is taken. Machine learning has the potential to assist in addressing these challenges since it offers advanced analysis and decision-making capabilities, such as disease prediction, treatment development, and clinical strategies. OBJECTIVE This paper represents the protocol of a scoping review that aims to identify and explore the current literature concerning the use of machine learning for patients with multimorbidity. More precisely, the objective is to recognize various machine learning models, the patient groups involved, features considered, types of input data, the maturity of the machine learning algorithms, and the outcomes from these machine learning models. METHODS The scoping review will be based on the guidelines of the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews). Five databases (PubMed, Embase, IEEE, Web of Science, and Scopus) are chosen to conduct a literature search. Two reviewers will independently screen the titles, abstracts, and full texts of identified studies based on predefined eligibility criteria. Covidence (Veritas Health Innovation Ltd) will be used as a tool for managing and screening papers. Only studies that examine more than 1 chronic disease or individuals with a single chronic condition at risk of developing another will be included in the scoping review. Data from the included studies will be collected using Microsoft Excel (Microsoft Corp). The focus of the data extraction will be on bibliographical information, objectives, study populations, types of input data, types of algorithm, performance, maturity of the algorithms, and outcome. RESULTS The screening process will be presented in a PRISMA-ScR flow diagram. The findings of the scoping review will be conveyed through a narrative synthesis. Additionally, data extracted from the studies will be presented in more comprehensive formats, such as charts or tables. The results will be presented in a forthcoming scoping review, which will be published in a peer-reviewed journal. CONCLUSIONS To our knowledge, this may be the first scoping review to investigate the use of machine learning in multimorbidity research. The goal of the scoping review is to summarize the field of literature on machine learning in patients with multiple chronic conditions, highlight different approaches, and potentially discover research gaps. The results will offer insights for future research within this field, contributing to developments that can enhance patient outcomes. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) PRR1-10.2196/53761.
Collapse
Affiliation(s)
| | - Ole Hejlesen
- Department of Health Science and Technology, Faculty of Medicine, Aalborg University, Gistrup, Denmark
| | - Ann-Dorthe Olsen Zwisler
- Clinic for Rehabilitation and Palliative Medicine, Rigshospitalet, Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Flemming Witt Udsen
- Department of Health Science and Technology, Faculty of Medicine, Aalborg University, Gistrup, Denmark
| |
Collapse
|
3
|
Fagbamigbe AF, Agrawal U, Azcoaga-Lorenzo A, MacKerron B, Özyiğit EB, Alexander DC, Akbari A, Owen RK, Lyons J, Lyons RA, Denaxas S, Kirk P, Miller AC, Harper G, Dezateux C, Brookes A, Richardson S, Nirantharakumar K, Guthrie B, Hughes L, Kadam UT, Khunti K, Abrams KR, McCowan C. Clustering long-term health conditions among 67728 people with multimorbidity using electronic health records in Scotland. PLoS One 2023; 18:e0294666. [PMID: 38019832 PMCID: PMC10686427 DOI: 10.1371/journal.pone.0294666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 11/07/2023] [Indexed: 12/01/2023] Open
Abstract
There is still limited understanding of how chronic conditions co-occur in patients with multimorbidity and what are the consequences for patients and the health care system. Most reported clusters of conditions have not considered the demographic characteristics of these patients during the clustering process. The study used data for all registered patients that were resident in Fife or Tayside, Scotland and aged 25 years or more on 1st January 2000 and who were followed up until 31st December 2018. We used linked demographic information, and secondary care electronic health records from 1st January 2000. Individuals with at least two of the 31 Elixhauser Comorbidity Index conditions were identified as having multimorbidity. Market basket analysis was used to cluster the conditions for the whole population and then repeatedly stratified by age, sex and deprivation. 318,235 individuals were included in the analysis, with 67,728 (21·3%) having multimorbidity. We identified five distinct clusters of conditions in the population with multimorbidity: alcohol misuse, cancer, obesity, renal failure, and heart failure. Clusters of long-term conditions differed by age, sex and socioeconomic deprivation, with some clusters not present for specific strata and others including additional conditions. These findings highlight the importance of considering demographic factors during both clustering analysis and intervention planning for individuals with multiple long-term conditions. By taking these factors into account, the healthcare system may be better equipped to develop tailored interventions that address the needs of complex patients.
Collapse
Affiliation(s)
- Adeniyi Francis Fagbamigbe
- School of Medicine, University of St Andrews, St Andrews, United Kingdom
- Department of Epidemiology and Medical Statistics, University of Ibadan, Ibadan, Nigeria
- Institute of Applied Health Sciences, University of Aberdeen, Aberdeen, United Kingdom
- Research Methods and Evaluation Unit, Institute for Health & Wellbeing, Coventry University, Coventry, United Kingdom
| | - Utkarsh Agrawal
- Nuffield Department of Primary Care Health Science, University of Oxford, Oxford, United Kingdom
| | - Amaya Azcoaga-Lorenzo
- School of Medicine, University of St Andrews, St Andrews, United Kingdom
- Hospital Rey Juan Carlos, Instituto de Investigación Sanitaria Fundación Jimenez Diaz, Madrid, Spain
| | - Briana MacKerron
- School of Medicine, University of St Andrews, St Andrews, United Kingdom
| | - Eda Bilici Özyiğit
- Centre for Medical Image Computing, Department of Computer Science, UCL, London, United Kingdom
| | - Daniel C. Alexander
- Centre for Medical Image Computing, Department of Computer Science, UCL, London, United Kingdom
| | - Ashley Akbari
- Population Data Science, Swansea University Medical School, Swansea University, Swansea, United Kingdom
| | - Rhiannon K. Owen
- Population Data Science, Swansea University Medical School, Swansea University, Swansea, United Kingdom
| | - Jane Lyons
- Population Data Science, Swansea University Medical School, Swansea University, Swansea, United Kingdom
| | - Ronan A. Lyons
- Population Data Science, Swansea University Medical School, Swansea University, Swansea, United Kingdom
| | - Spiros Denaxas
- Institute of Health Informatics, UCL, London, United Kingdom
- British Heart Foundation Data Science Centre, London, United Kingdom
| | - Paul Kirk
- MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom
| | - Ana Corina Miller
- Centre for Public Health, Institute of Clinical Science, Queen’s University Belfast, Belfast, United Kingdom
| | - Gill Harper
- Clinical Effectiveness Group, Wolfson Institute of Population Health, Queen Mary University of London, London, United Kingdom
| | - Carol Dezateux
- Clinical Effectiveness Group, Wolfson Institute of Population Health, Queen Mary University of London, London, United Kingdom
| | - Anthony Brookes
- Department of Genetics & Genome Biology, University of Leicester, Leicester, United Kingdom
| | - Sylvia Richardson
- MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom
| | | | - Bruce Guthrie
- Advanced Care Research Centre, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Lloyd Hughes
- School of Medicine, University of St Andrews, St Andrews, United Kingdom
| | - Umesh T. Kadam
- Department of Population Health Sciences, University of Leicester, Leicester, United Kingdom
| | - Kamlesh Khunti
- Diabetes Research Centre, University of Leicester, Leicester, United Kingdom
| | - Keith R. Abrams
- Department of Statistics, University of Warwick, Coventry, United Kingdom
| | - Colin McCowan
- School of Medicine, University of St Andrews, St Andrews, United Kingdom
| |
Collapse
|
4
|
Aziz F, Slater LT, Bravo-Merodio L, Acharjee A, Gkoutos GV. Link prediction in complex network using information flow. Sci Rep 2023; 13:14660. [PMID: 37669983 PMCID: PMC10480459 DOI: 10.1038/s41598-023-41476-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 08/27/2023] [Indexed: 09/07/2023] Open
Abstract
Link prediction in complex networks has recently attracted a great deal of attraction in diverse scientific domains, including social and biological sciences. Given a snapshot of a network, the goal is to predict links that are missing in the network or that are likely to occur in the near future. This problem has both theoretical and practical significance; it not only helps us to identify missing links in a network more efficiently by avoiding the expensive and time consuming experimental processes, but also allows us to study the evolution of a network with time. To address the problem of link prediction, numerous attempts have been made over the recent years that exploit the local and the global topological properties of the network to predict missing links in the network. In this paper, we use parametrised matrix forest index (PMFI) to predict missing links in a network. We show that, for small parameter values, this index is linked to a heat diffusion process on a graph and therefore encodes geometric properties of the network. We then develop a framework that combines the PMFI with a local similarity index to predict missing links in the network. The framework is applied to numerous networks obtained from diverse domains such as social network, biological network, and transport network. The results show that the proposed method can predict missing links with higher accuracy when compared to other state-of-the-art link prediction methods.
Collapse
Affiliation(s)
- Furqan Aziz
- School of Computing and Mathematical Sciences, University of Leicester, University Rd, Leicester, LE1 7RH, UK.
- Centre for Health Data Science, Birmingham, B15 2WB, UK.
| | - Luke T Slater
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, UK
- Institute of Translational Medicine, University of Birmingham, Birmingham, B15 2TT, UK
- Centre for Health Data Science, Birmingham, B15 2WB, UK
| | - Laura Bravo-Merodio
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, UK
- Institute of Translational Medicine, University of Birmingham, Birmingham, B15 2TT, UK
- Centre for Health Data Science, Birmingham, B15 2WB, UK
| | - Animesh Acharjee
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, UK
- Institute of Translational Medicine, University of Birmingham, Birmingham, B15 2TT, UK
- MRC Health Data Research UK (HDR UK), London, UK
- Centre for Health Data Science, Birmingham, B15 2WB, UK
| | - Georgios V Gkoutos
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, UK
- Institute of Translational Medicine, University of Birmingham, Birmingham, B15 2TT, UK
- NIHR Surgical Reconstruction and Microbiology Research Centre, University Hospital Birmingham, Birmingham, B15 2WB, UK
- MRC Health Data Research UK (HDR UK), London, UK
- NIHR Experimental Cancer Medicine Centre, Birmingham, B15 2TT, UK
- Centre for Health Data Science, Birmingham, B15 2WB, UK
- Centre for Environmental Research & Advocacy, University of Birmingham, Birmingham, B15 2TT, UK
| |
Collapse
|
5
|
Khoushehgir F, Sulaimany S. Negative link prediction to reduce dropout in Massive Open Online Courses. EDUCATION AND INFORMATION TECHNOLOGIES 2023; 28:1-20. [PMID: 36714444 PMCID: PMC9875174 DOI: 10.1007/s10639-023-11597-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 01/12/2023] [Indexed: 06/18/2023]
Abstract
In recent years, the rapid growth of Massive Open Online Courses (MOOCs) has attracted much attention for related research. Besides, one of the main challenges in MOOCs is the high dropout or low completion rate. Early dropout prediction algorithms aim the educational institutes to retain the students for the related course. There are several methods for identification of the resigning students. These methods are often based on supervised machine learning, and require student activity records to train and create a prediction model based on the features extracted from the raw data. The performance of graph-based algorithms in various applications to discover the strong or weak relationships between entities using limited data encouraged us to turn to these algorithms for this problem. Objective of this paper is proposing a novel method with low complexity, negative link prediction algorithm, for the first time, utilizing only network topological data for dropout prediction. The idea is based on the assumption that entities with similar network structures are more likely to establish or remove a relation. Therefore, we first convert the data into a graph, mapping entities (students and courses) to nodes and relationships (enrollment data) to links. Then we use graph-based algorithms to predict students' dropout, utilizing just enrollment data. The experimental results demonstrate that the proposed method achieves significant performance compared to baseline ones. However, we test the supervised link prediction idea, and show the competitive and promising results in this case as well. Finally, we present important future research directions to improve the results.
Collapse
Affiliation(s)
- Fatemeh Khoushehgir
- Department of IT and Computer Engineering, Azarbaijan Shahid Madani University, Tabriz, Iran
| | - Sadegh Sulaimany
- Social and Biological Network Analysis Laboratory (SBNA), Department of Computer Engineering, University of Kurdistan, Sanandaj, Iran
| |
Collapse
|
6
|
Structure information learning for neutral links in signed network embedding. Inf Process Manag 2022. [DOI: 10.1016/j.ipm.2022.102917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
7
|
Mean Received Resources Meet Machine Learning Algorithms to Improve Link Prediction Methods. INFORMATION 2022. [DOI: 10.3390/info13010035] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
The analysis of social networks has attracted a lot of attention during the last two decades. These networks are dynamic: new links appear and disappear. Link prediction is the problem of inferring links that will appear in the future from the actual state of the network. We use information from nodes and edges and calculate the similarity between users. The more users are similar, the higher the probability of their connection in the future will be. The similarity metrics play an important role in the link prediction field. Due to their simplicity and flexibility, many authors have proposed several metrics such as Jaccard, AA, and Katz and evaluated them using the area under the curve (AUC). In this paper, we propose a new parameterized method to enhance the AUC value of the link prediction metrics by combining them with the mean received resources (MRRs). Experiments show that the proposed method improves the performance of the state-of-the-art metrics. Moreover, we used machine learning algorithms to classify links and confirm the efficiency of the proposed combination.
Collapse
|