1
|
Cunningham M, Pins D, Dezső Z, Torrent M, Vasanthakumar A, Pandey A. PINNED: identifying characteristics of druggable human proteins using an interpretable neural network. J Cheminform 2023; 15:64. [PMID: 37468968 PMCID: PMC10354961 DOI: 10.1186/s13321-023-00735-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 07/10/2023] [Indexed: 07/21/2023] Open
Abstract
The identification of human proteins that are amenable to pharmacologic modulation without significant off-target effects remains an important unsolved challenge. Computational methods have been devised to identify features which distinguish between "druggable" and "undruggable" proteins, finding that protein sequence, tissue and cellular localization, biological role, and position in the protein-protein interaction network are all important discriminant factors. However, many prior efforts to automate the assessment of protein druggability suffer from low performance or poor interpretability. We developed a neural network-based machine learning model capable of generating druggability sub-scores based on each of four distinct categories, combining them to form an overall druggability score. The model achieves an excellent performance in separating drugged and undrugged proteins in the human proteome, with an area under the receiver operating characteristic (AUC) of 0.95. Our use of multiple sub-scores allows the assessment of potential protein targets of interest based on distinct contributors to druggability, leading to a more interpretable and holistic model to identify novel targets.
Collapse
Affiliation(s)
- Michael Cunningham
- Genomics Research Center, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA.
| | - Danielle Pins
- Information Research, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA
| | - Zoltán Dezső
- Genomics Research Center, AbbVie Inc., 1000 Gateway Boulevard, South San Francisco, CA, 94080, USA
| | - Maricel Torrent
- Small Molecule Therapeutics and Platform Technologies, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA
| | - Aparna Vasanthakumar
- Genomics Research Center, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA
| | - Abhishek Pandey
- Information Research, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA
| |
Collapse
|
2
|
Thafar MA, Albaradei S, Uludag M, Alshahrani M, Gojobori T, Essack M, Gao X. OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features. Front Genet 2023; 14:1139626. [PMID: 37091791 PMCID: PMC10117673 DOI: 10.3389/fgene.2023.1139626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Accepted: 03/24/2023] [Indexed: 04/08/2023] Open
Abstract
Late-stage drug development failures are usually a consequence of ineffective targets. Thus, proper target identification is needed, which may be possible using computational approaches. The reason being, effective targets have disease-relevant biological functions, and omics data unveil the proteins involved in these functions. Also, properties that favor the existence of binding between drug and target are deducible from the protein’s amino acid sequence. In this work, we developed OncoRTT, a deep learning (DL)-based method for predicting novel therapeutic targets. OncoRTT is designed to reduce suboptimal target selection by identifying novel targets based on features of known effective targets using DL approaches. First, we created the “OncologyTT” datasets, which include genes/proteins associated with ten prevalent cancer types. Then, we generated three sets of features for all genes: omics features, the proteins’ amino-acid sequence BERT embeddings, and the integrated features to train and test the DL classifiers separately. The models achieved high prediction performances in terms of area under the curve (AUC), i.e., AUC greater than 0.88 for all cancer types, with a maximum of 0.95 for leukemia. Also, OncoRTT outperformed the state-of-the-art method using their data in five out of seven cancer types commonly assessed by both methods. Furthermore, OncoRTT predicts novel therapeutic targets using new test data related to the seven cancer types. We further corroborated these results with other validation evidence using the Open Targets Platform and a case study focused on the top-10 predicted therapeutic targets for lung cancer.
Collapse
Affiliation(s)
- Maha A. Thafar
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- College of Computers and Information Technology, Computer Science Department, Taif University, Taif, Saudi Arabia
| | - Somayah Albaradei
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Mahmut Uludag
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Mona Alshahrani
- National Center for Artificial Intelligence (NCAI), Saudi Data and Artificial Intelligence Authority (SDAIA), Riyadh, Saudi Arabia
| | - Takashi Gojobori
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Magbubah Essack
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- *Correspondence: Xin Gao, ; Magbubah Essack,
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- *Correspondence: Xin Gao, ; Magbubah Essack,
| |
Collapse
|
3
|
Omics Data and Data Representations for Deep Learning-Based Predictive Modeling. Int J Mol Sci 2022; 23:ijms232012272. [PMID: 36293133 PMCID: PMC9603455 DOI: 10.3390/ijms232012272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 10/03/2022] [Accepted: 10/12/2022] [Indexed: 11/25/2022] Open
Abstract
Medical discoveries mainly depend on the capability to process and analyze biological datasets, which inundate the scientific community and are still expanding as the cost of next-generation sequencing technologies is decreasing. Deep learning (DL) is a viable method to exploit this massive data stream since it has advanced quickly with there being successive innovations. However, an obstacle to scientific progress emerges: the difficulty of applying DL to biology, and this because both fields are evolving at a breakneck pace, thus making it hard for an individual to occupy the front lines of both of them. This paper aims to bridge the gap and help computer scientists bring their valuable expertise into the life sciences. This work provides an overview of the most common types of biological data and data representations that are used to train DL models, with additional information on the models themselves and the various tasks that are being tackled. This is the essential information a DL expert with no background in biology needs in order to participate in DL-based research projects in biomedicine, biotechnology, and drug discovery. Alternatively, this study could be also useful to researchers in biology to understand and utilize the power of DL to gain better insights into and extract important information from the omics data.
Collapse
|
4
|
Vandersluis S, Reid JC, Orlando L, Bhatia M. Evidence-based support for phenotypic drug discovery in acute myeloid leukemia. Drug Discov Today 2022; 27:103407. [DOI: 10.1016/j.drudis.2022.103407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Revised: 08/01/2022] [Accepted: 10/10/2022] [Indexed: 11/03/2022]
|
5
|
Riedmayr LM, Hinrichsmeyer KS, Karguth N, Böhm S, Splith V, Michalakis S, Becirovic E. dCas9-VPR-mediated transcriptional activation of functionally equivalent genes for gene therapy. Nat Protoc 2022; 17:781-818. [PMID: 35132255 DOI: 10.1038/s41596-021-00666-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 11/18/2021] [Indexed: 12/19/2022]
Abstract
Many disease-causing genes possess functionally equivalent counterparts, which are often expressed in distinct cell types. An attractive gene therapy approach for inherited disorders caused by mutations in such genes is to transcriptionally activate the appropriate counterpart(s) to compensate for the missing gene function. This approach offers key advantages over conventional gene therapies because it is mutation- and gene size-independent. Here, we describe a protocol for the design, execution and evaluation of such gene therapies using dCas9-VPR. We offer guidelines on how to identify functionally equivalent genes, design and clone single guide RNAs and evaluate transcriptional activation in vitro. Moreover, focusing on inherited retinal diseases, we provide a detailed protocol on how to apply this strategy in mice using dual recombinant adeno-associated virus vectors and how to evaluate its functionality and off-target effects in the target tissue. This strategy is in principle applicable to all organisms that possess functionally equivalent genes suitable for transcriptional activation and addresses pivotal unmet needs in gene therapy with high translational potential. The protocol can be completed in 15-20 weeks.
Collapse
Affiliation(s)
- Lisa M Riedmayr
- Department of Pharmacy-Center for Drug Research, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Klara S Hinrichsmeyer
- Department of Pharmacy-Center for Drug Research, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Nina Karguth
- Department of Pharmacy-Center for Drug Research, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Sybille Böhm
- Department of Pharmacy-Center for Drug Research, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Victoria Splith
- Department of Pharmacy-Center for Drug Research, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Stylianos Michalakis
- Department of Ophthalmology, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Elvir Becirovic
- Department of Pharmacy-Center for Drug Research, Ludwig-Maximilians-Universität München, Munich, Germany.
| |
Collapse
|
6
|
Manduchi E, Romano JD, Moore JH. The promise of automated machine learning for the genetic analysis of complex traits. Hum Genet 2021; 141:1529-1544. [PMID: 34713318 PMCID: PMC9360157 DOI: 10.1007/s00439-021-02393-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 10/22/2021] [Indexed: 12/24/2022]
Abstract
The genetic analysis of complex traits has been dominated by parametric statistical methods due to their theoretical properties, ease of use, computational efficiency, and intuitive interpretation. However, there are likely to be patterns arising from complex genetic architectures which are more easily detected and modeled using machine learning methods. Unfortunately, selecting the right machine learning algorithm and tuning its hyperparameters can be daunting for experts and non-experts alike. The goal of automated machine learning (AutoML) is to let a computer algorithm identify the right algorithms and hyperparameters thus taking the guesswork out of the optimization process. We review the promises and challenges of AutoML for the genetic analysis of complex traits and give an overview of several approaches and some example applications to omics data. It is our hope that this review will motivate studies to develop and evaluate novel AutoML methods and software in the genetics and genomics space. The promise of AutoML is to enable anyone, regardless of training or expertise, to apply machine learning as part of their genetic analysis strategy.
Collapse
Affiliation(s)
- Elisabetta Manduchi
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, 19104, USA.,Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Joseph D Romano
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Jason H Moore
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, 19104, USA. .,Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
7
|
Sherman J, Verstandig G, Brumer Y. Application of machine learning to large in-vitro databases to identify cancer cell characteristics: telomerase reverse transcriptase (TERT) expression. Oncogene 2021; 40:5038-5041. [PMID: 34135463 DOI: 10.1038/s41388-021-01894-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 05/21/2021] [Accepted: 06/04/2021] [Indexed: 02/08/2023]
Abstract
Advances in biotechnology and machine learning have created an enhanced environment for unearthing and exploiting previously unrecognized relationships between genomic and epigenetic data with potential therapeutic implications. We applied advanced algorithms to data from the Cancer Dependency Map to uncover increasingly complex relationships. Specifically, we investigate characteristics of tumor cell lines with varying levels of telomerase reverse transcriptase (TERT) expression in liver cancer. The findings indicate that the effect of CRISPR knockout of Histone Deacetylase 1 (HDAC1) and numerous individual respiratory complex I genes is strongly related to the level of TERT expression, with knockout being particularly efficacious at killing or inhibiting growth of tumor cells with low levels of TERT expression for HDAC1 and high levels for Complex I genes. These findings suggest key biomarkers for therapeutic efficacy and yield novel potential pathways for drug development and provide further proof of principle for the potential of artificial intelligence in oncology.
Collapse
Affiliation(s)
- Jeff Sherman
- Zephyr AI, Washington, DC, USA. .,Red Cell Partners, Washington, DC, USA.
| | - Grant Verstandig
- Zephyr AI, Washington, DC, USA.,Red Cell Partners, Washington, DC, USA
| | - Yisroel Brumer
- Zephyr AI, Washington, DC, USA.,Red Cell Partners, Washington, DC, USA
| |
Collapse
|
8
|
Sherman J, Verstandig G, Rowe JW, Brumer Y. Application of machine learning to large in vitro databases to identify drug-cancer cell interactions: azithromycin and KLK6 mutation status. Oncogene 2021; 40:3766-3770. [PMID: 33953352 DOI: 10.1038/s41388-021-01807-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 03/16/2021] [Accepted: 04/20/2021] [Indexed: 02/03/2023]
Abstract
Recent advances in machine learning promise to yield novel insights by interrogation of large datasets ranging from gene expression and mutation data to CRISPR knockouts and drug screens. We combined existing and new algorithms with available experimental data to identify potentially clinically relevant relationships to provide a proof of principle for the promise of machine learning in oncological drug discovery. Specifically, we screened cell line data from the Cancer Dependency Map for the effects of azithromycin, which has been shown to kill cancer cells in vitro. Our findings demonstrate a strong relationship between Kallikrein Related Peptidase 6 (KLK6) mutation status and the ability of azithromycin to kill cancer cells in vitro. While the application of azithromycin showed no meaningful average effect in KLK6 wild-type cell lines, statistically significant enhancements of cell death are seen in multiple independent KLK6-mutated cancer cell lines. These findings suggest a potentially valuable clinical strategy in patients with KLK6-mutated malignancies.
Collapse
Affiliation(s)
- Jeff Sherman
- Zephyr AI, Washington, DC, USA. .,Red Cell Partners, Washington, DC, USA.
| | - Grant Verstandig
- Zephyr AI, Washington, DC, USA.,Red Cell Partners, Washington, DC, USA
| | | | - Yisroel Brumer
- Zephyr AI, Washington, DC, USA.,Red Cell Partners, Washington, DC, USA
| |
Collapse
|
9
|
A primer on applying AI synergistically with domain expertise to oncology. Biochim Biophys Acta Rev Cancer 2021; 1876:188548. [PMID: 33901609 DOI: 10.1016/j.bbcan.2021.188548] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 04/13/2021] [Accepted: 04/15/2021] [Indexed: 12/24/2022]
Abstract
BACKGROUND The concurrent growth of large-scale oncology data alongside the computational methods with which to analyze and model it has created a promising environment for revolutionizing cancer diagnosis, treatment, prevention, and drug discovery. Computational methods applied to large datasets have accelerated the drug discovery process by reducing bottlenecks and widening the search space beyond what is experimentally tractable. As the research community gains understanding of the myriad genetic underpinnings of cancer via sequencing, imaging, screens, and more that are ingested, transformed, and modeled by top open-source machine learning and artificial intelligence tools readily available, the next big drug candidate might seem merely an "Enter" key away. Of course, the reality is more convoluted, but still promising. SCOPE OF REVIEW We present methods to approach the process of building an AI model, with strong emphasis on the aspects of model development we believe to be crucial to success but that are not commonly discussed: diligence in posing questions, identifying suitable datasets and curating them, and collaborating closely with biology and oncology experts while designing and evaluating the model. Digital pathology, Electronic Health Records, and other data types outside of high-throughput molecular data are reviewed well by others and outside of the scope of this review. This review emphasizes the importance of considering the limitations of the datasets, computational methods, and our minds when designing AI models. For example, datasets can be biased towards areas of research interest, funding, and particular patient populations. Neural networks may learn representations and correlations within the data that are grounded not in biological phenomena, but statistical anomalies erroneously extracted from the training data. Researchers may mis-interpret or over-interpret the output, or design and evaluate the training process such that the resultant model generalizes poorly. Fortunately, awareness of the strengths and limitations of applying data analytics and AI to drug discovery enables us to leverage them carefully and insightfully while maximizing their utility. These applications when performed in close collaboration with domain experts, together with continuous critical evaluation, generation of new data to minimize known blind spots as they are found, and rigorous experimental validation, increases the success rate of the study. We will discuss applications including AI-assisted target identification, drug repurposing, patient stratification, and gene prioritization. MAJOR CONCLUSIONS Data analytics and AI have demonstrated capabilities to revolutionize cancer research, prevention, and treatment by maximizing our understanding and use of the expanding panoply of experimental data. However, to separate promise from true utility, computational tools must be carefully designed, critically evaluated, and constantly improved. Once that is achieved, a human-computer hybrid discovery process will outperform one driven by each alone. GENERAL SIGNIFICANCE This review highlights the challenges and promise of synergizing predictive AI models with human expertise towards greater understanding of cancer.
Collapse
|