1
|
Kumar H, Kim P. Artificial intelligence in fusion protein three-dimensional structure prediction: Review and perspective. Clin Transl Med 2024; 14:e1789. [PMID: 39090739 PMCID: PMC11294035 DOI: 10.1002/ctm2.1789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 07/16/2024] [Accepted: 07/19/2024] [Indexed: 08/04/2024] Open
Abstract
Recent advancements in artificial intelligence (AI) have accelerated the prediction of unknown protein structures. However, accurately predicting the three-dimensional (3D) structures of fusion proteins remains a difficult task because the current AI-based protein structure predictions are focused on the WT proteins rather than on the newly fused proteins in nature. Following the central dogma of biology, fusion proteins are translated from fusion transcripts, which are made by transcribing the fusion genes between two different loci through the chromosomal rearrangements in cancer. Accurately predicting the 3D structures of fusion proteins is important for understanding the functional roles and mechanisms of action of new chimeric proteins. However, predicting their 3D structure using a template-based model is challenging because known template structures are often unavailable in databases. Deep learning (DL) models that utilize multi-level protein information have revolutionized the prediction of protein 3D structures. In this review paper, we highlighted the latest advancements and ongoing challenges in predicting the 3D structure of fusion proteins using DL models. We aim to explore both the advantages and challenges of employing AlphaFold2, RoseTTAFold, tr-Rosetta and D-I-TASSER for modelling the 3D structures. HIGHLIGHTS: This review provides the overall pipeline and landscape of the prediction of the 3D structure of fusion protein. This review provides the factors that should be considered in predicting the 3D structures of fusion proteins using AI approaches in each step. This review highlights the latest advancements and ongoing challenges in predicting the 3D structure of fusion proteins using deep learning models. This review explores the advantages and challenges of employing AlphaFold2, RoseTTAFold, tr-Rosetta, and D-I-TASSER to model 3D structures.
Collapse
Affiliation(s)
- Himansu Kumar
- Department of Bioinformatics and Systems MedicineMcWilliams School of Biomedical InformaticsThe University of Texas Health Science Center at HoustonHoustonTexasUSA
| | - Pora Kim
- Department of Bioinformatics and Systems MedicineMcWilliams School of Biomedical InformaticsThe University of Texas Health Science Center at HoustonHoustonTexasUSA
| |
Collapse
|
2
|
Sánchez-Marín D, Silva-Cázares MB, Porras-Reyes FI, García-Román R, Campos-Parra AD. Breaking paradigms: Long non-coding RNAs forming gene fusions with potential implications in cancer. Genes Dis 2024; 11:101136. [PMID: 38292185 PMCID: PMC10825296 DOI: 10.1016/j.gendis.2023.101136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 08/16/2023] [Accepted: 09/10/2023] [Indexed: 02/01/2024] Open
Abstract
Long non-coding RNAs (lncRNAs) are non-coding RNAs longer than 200 nucleotides with dynamic regulatory functions. They interact with a wide range of molecules such as DNA, RNA, and proteins to modulate diverse cellular functions through several mechanisms and, if deregulated, they can lead to cancer development and progression. Recently, it has been described that lncRNAs are susceptible to form gene fusions with mRNAs or other lncRNAs, breaking the paradigm of gene fusions consisting mainly of protein-coding genes. However, their biological significance in the tumor phenotype is still uncertain. Therefore, their recent identification opens a new line of research to study their biological role in tumorigenesis, and their potential as biomarkers with clinical relevance or as therapeutic targets. The present study aimed to review the lncRNA fusions identified so far and to know which of them have been associated with a potential function. We address the current challenges to deepen their study as well as the reasons why they represent a future therapeutic window in cancer.
Collapse
Affiliation(s)
- David Sánchez-Marín
- Posgrado en Ciencias Biológicas, Facultad de Medicina, Universidad Nacional Autónoma de México, Ciudad de México, C.P. 04360, México
| | - Macrina Beatriz Silva-Cázares
- Unidad Académica Multidisciplinaria Región Altiplano, Universidad Autónoma de San Luis Potosí (UASLP), Carretera a Cedral Km 5+600, Ejido San José de la Trojes, Matehuala, San Luis Potosí, C.P. 78760, México
| | - Fany Iris Porras-Reyes
- Servicio de Anatomía Patológica, Instituto Nacional de Cancerología (INCan), Niño Jesús, Tlalpan, Ciudad de México, C.P. 14080, México
| | - Rebeca García-Román
- Instituto de Salud Pública, Universidad Veracruzana (UV), Av. Dr Luis, Dr. Castelazo Ayala s/n, Col. Industrial Ánimas, Xalapa, Veracruz, C.P. 91190, México
| | - Alma D. Campos-Parra
- Instituto de Salud Pública, Universidad Veracruzana (UV), Av. Dr Luis, Dr. Castelazo Ayala s/n, Col. Industrial Ánimas, Xalapa, Veracruz, C.P. 91190, México
| |
Collapse
|
3
|
Kumar H, Tang LY, Yang C, Kim P. FusionPDB: a knowledgebase of human fusion proteins. Nucleic Acids Res 2024; 52:D1289-D1304. [PMID: 37870473 PMCID: PMC10767906 DOI: 10.1093/nar/gkad920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/19/2023] [Accepted: 10/09/2023] [Indexed: 10/24/2023] Open
Abstract
Tumorigenic functions due to the formation of fusion genes have been targeted for cancer therapeutics (i.e. kinase inhibitors). However, many fusion proteins involved in various cellular processes have not been studied for targeted therapeutics. This is because the lack of complete fusion protein sequences and their whole 3D structures has made it challenging to develop new therapeutic strategies. To fill these critical gaps, we developed a computational pipeline and a resource of human fusion proteins named FusionPDB, available at https://compbio.uth.edu/FusionPDB. FusionPDB is organized into four levels: 43K fusion protein sequences (14.7K in-frame fusion genes, Level 1), over 2300 + 1267 fusion protein 3D structures (from 2300 recurrent and 266 manually curated in-frame fusion genes, Level 2), pLDDT score analysis for the 1267 fusion proteins from 266 manually curated fusion genes (Level 3), and virtual screening outcomes for 68 selected fusion proteins from 266 manually curated fusion genes (Level 4). FusionPDB is the only resource providing whole 3D structures of fusion proteins and comprehensive knowledge of human fusion proteins. It will be regularly updated until it covers all human fusion proteins in the future.
Collapse
Affiliation(s)
- Himansu Kumar
- Department of Bioinformatics and Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Lin-Ya Tang
- Department of Bioinformatics and Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Chengyuan Yang
- School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Pora Kim
- Department of Bioinformatics and Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
4
|
Haas BJ, Dobin A, Ghandi M, Van Arsdale A, Tickle T, Robinson JT, Gillani R, Kasif S, Regev A. Targeted in silico characterization of fusion transcripts in tumor and normal tissues via FusionInspector. CELL REPORTS METHODS 2023; 3:100467. [PMID: 37323575 PMCID: PMC10261907 DOI: 10.1016/j.crmeth.2023.100467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 02/28/2023] [Accepted: 04/14/2023] [Indexed: 06/17/2023]
Abstract
Here, we present FusionInspector for in silico characterization and interpretation of candidate fusion transcripts from RNA sequencing (RNA-seq) and exploration of their sequence and expression characteristics. We applied FusionInspector to thousands of tumor and normal transcriptomes and identified statistical and experimental features enriched among biologically impactful fusions. Through clustering and machine learning, we identified large collections of fusions potentially relevant to tumor and normal biological processes. We show that biologically relevant fusions are enriched for relatively high expression of the fusion transcript, imbalanced fusion allelic ratios, and canonical splicing patterns, and are deficient in sequence microhomologies between partner genes. We demonstrate that FusionInspector accurately validates fusion transcripts in silico and helps characterize numerous understudied fusions in tumor and normal tissue samples. FusionInspector is freely available as open source for screening, characterization, and visualization of candidate fusions via RNA-seq, and facilitates transparent explanation and interpretation of machine-learning predictions and their experimental sources.
Collapse
Affiliation(s)
- Brian J. Haas
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
| | | | | | - Anne Van Arsdale
- Department of Obstetrics and Gynecology and Women’s Health, Albert Einstein Montefiore Medical Center, Bronx, NY 10461, USA
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Timothy Tickle
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - James T. Robinson
- School of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Riaz Gillani
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA 02215, USA
- Boston Children’s Hospital, Boston, MA 02115, USA
| | - Simon Kasif
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - Aviv Regev
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| |
Collapse
|
5
|
Kumar H, Kim P. Computational design of DNA binding domain-retained fusion proteins and virtual screening against FDA-approved drugs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.05.539610. [PMID: 37214900 PMCID: PMC10197581 DOI: 10.1101/2023.05.05.539610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Even though the transcription factors (TFs) are not regarded as good drug targets, mutated or dysregulated TFs can be a unique class of drug targets. Specifically, the TF fusion protein, which is the translated structural variants including TFs may affect downstream to promote tumorigenesis. To date, we lack the fusion protein sequence information and 3D structure information in identifying the potential drugs of fusion proteins. In this study, we predicted the 3D structures of 732 transcription factor fusion proteins (TFFPs). For the top five most frequent TFFPs, we performed the virtual screening across the FDA-approved drugs. Our study will provide an initial platform to develop novel therapeutic targets in the transcription factor fusion proteins.
Collapse
|
6
|
Experimentally Deduced Criteria for Detection of Clinically Relevant Fusion 3′ Oncogenes from FFPE Bulk RNA Sequencing Data. Biomedicines 2022; 10:biomedicines10081866. [PMID: 36009413 PMCID: PMC9405289 DOI: 10.3390/biomedicines10081866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 07/15/2022] [Accepted: 07/29/2022] [Indexed: 11/25/2022] Open
Abstract
Drugs targeting receptor tyrosine kinase (RTK) oncogenic fusion proteins demonstrate impressive anti-cancer activities. The fusion presence in the cancer is the respective drug prescription biomarker, but their identification is challenging as both the breakpoint and the exact fusion partners are unknown. RNAseq offers the advantage of finding both fusion parts by screening sequencing reads. Paraffin (FFPE) tissue blocks are the most common way of storing cancer biomaterials in biobanks. However, finding RTK fusions in FFPE samples is challenging as RNA fragments are short and their artifact ligation may appear in sequencing libraries. Here, we annotated RNAseq reads of 764 experimental FFPE solid cancer samples, 96 leukemia samples, and 2 cell lines, and identified 36 putative clinically relevant RTK fusions with junctions corresponding to exon borders of the fusion partners. Where possible, putative fusions were validated by RT-PCR (confirmed for 10/25 fusions tested). For the confirmed 3′RTK fusions, we observed the following distinguishing features. Both moieties were in-frame, and the tyrosine kinase domain was preserved. RTK exon coverage by RNAseq reads upstream of the junction site were lower than downstream. Finally, most of the true fusions were present by more than one RNAseq read. This provides the basis for automatic annotation of 3′RTK fusions using FFPE RNAseq profiles.
Collapse
|
7
|
Kim P, Tan H, Liu J, Lee H, Jung H, Kumar H, Zhou X. FusionGDB 2.0: fusion gene annotation updates aided by deep learning. Nucleic Acids Res 2021; 50:D1221-D1230. [PMID: 34755868 PMCID: PMC8728198 DOI: 10.1093/nar/gkab1056] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 10/10/2021] [Accepted: 11/03/2021] [Indexed: 01/08/2023] Open
Abstract
A knowledgebase of the systematic functional annotation of fusion genes is critical for understanding genomic breakage context and developing therapeutic strategies. FusionGDB is a unique functional annotation database of human fusion genes and has been widely used for studies with diverse aims. In this study, we report fusion gene annotation updates aided by deep learning (FusionGDB 2.0) available at https://compbio.uth.edu/FusionGDB2/. FusionGDB 2.0 has substantial updates of contents such as up-to-date human fusion genes, fusion gene breakage tendency score with FusionAI deep learning model based on 20 kb DNA sequence around BP, investigation of overlapping between fusion breakpoints with 44 human genomic features across five cellular role's categories, transcribed chimeric sequence and following open reading frame analysis with coding potential based on deep learning approach with Ribo-seq read features, and rigorous investigation of the protein feature retention of individual fusion partner genes in the protein level. Among ∼102k fusion genes, about 15k kept their ORF as In-frames, which is two times compared to the previous version, FusionGDB. FusionGDB 2.0 will be used as the reference knowledgebase of fusion gene annotations. FusionGDB 2.0 provides eight categories of annotations and it will be helpful for diverse human genomic studies.
Collapse
Affiliation(s)
- Pora Kim
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Hua Tan
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Jiajia Liu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Haeseung Lee
- Intellectual Information Team, Future Medicine Division, Korea Institute of Oriental Medicine, Daejeon, South Korea
| | - Hyesoo Jung
- Department of Neurology, Asan Medical Center, Seoul, Korea
| | - Himanshu Kumar
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Xiaobo Zhou
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.,McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.,School of Dentistry, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
8
|
FusionAI: Predicting fusion breakpoint from DNA sequence with deep learning. iScience 2021; 24:103164. [PMID: 34646994 PMCID: PMC8501764 DOI: 10.1016/j.isci.2021.103164] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 07/16/2021] [Accepted: 09/21/2021] [Indexed: 12/12/2022] Open
Abstract
Identifying the molecular mechanisms related to genomic breakage is an important goal of cancer mechanism studies. Among diverse locations of structural variants, fusion genes, which have the breakpoints in the gene bodies and are typically identified from the split reads of RNA-seq data, can provide a highlighted structural variant resource for studying the genomic breakages with expression and potential pathogenic impacts. In this study, we developed FusionAI, which utilizes deep learning to predict gene fusion breakpoints based on DNA sequence and let us identify fusion breakage code and genomic context. FusionAI leverages the known fusion breakpoints to provide a prediction model of the fusion genes from the primary genomic sequences via deep learning, thereby helping researchers a more accurate selection of fusion genes and better understand genomic breakage. FusionAI predicts fusion gene breakpoints from a DNA sequence FusonAI reduce the effort for validating fusion genes with other tools High feature importance regions were apart 100nt from the exon junction BPs High feature importance regions were overlapped with 44 human genomic features
Collapse
|