1
|
Arendt-Tranholm A, Mwirigi JM, Price TJ. RNA isoform expression landscape of the human dorsal root ganglion generated from long-read sequencing. Pain 2024:00006396-990000000-00606. [PMID: 38809314 DOI: 10.1097/j.pain.0000000000003255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 02/14/2024] [Indexed: 05/30/2024]
Abstract
ABSTRACT Splicing is a posttranscriptional RNA processing mechanism that enhances genomic complexity by creating multiple isoforms from the same gene. We aimed to characterize the isoforms expressed in the human peripheral nervous system, with the goal of creating a resource to identify novel isoforms of functionally relevant genes associated with somatosensation and nociception. We used long-read sequencing to document isoform expression in the human dorsal root ganglia from 3 organ donors and validated in silico by confirming expression in short-read sequencing from 3 independent organ donors. Nineteen thousand five hundred forty-seven isoforms of protein-coding genes were detected and validated. We identified 763 isoforms with at least one previously undescribed splice junction. Previously unannotated isoforms of multiple pain-associated genes, including ASIC3, MRGPRX1, and HNRNPK, were identified. In the novel isoforms of ASIC3, a region comprising approximately 35% of the 5'UTR was excised. By contrast, a novel splice junction was used in isoforms of MRGPRX1 to include an additional exon upstream of the start codon, consequently adding a region to the 5'UTR. Novel isoforms of HNRNPK were identified, which used previously unannotated splice sites to both excise exon 14 and include a sequence in the 3' end of exon 13. This novel insertion is predicted to introduce a tyrosine phosphorylation site potentially phosphorylated by SRC. We also independently confirm a recently reported DRG-specific splicing event in WNK1 that gives insight into how painless peripheral neuropathy occurs when this gene is mutated. Our findings give a clear overview of mRNA isoform diversity in the human dorsal root ganglia obtained using long-read sequencing.
Collapse
Affiliation(s)
- Asta Arendt-Tranholm
- Department of Neuroscience and Center for Advanced Pain Studies, School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, United States
| | | | | |
Collapse
|
2
|
Chu Y, Yu D, Li Y, Huang K, Shen Y, Cong L, Zhang J, Wang M. A 5' UTR Language Model for Decoding Untranslated Regions of mRNA and Function Predictions. NAT MACH INTELL 2024; 6:449-460. [PMID: 38855263 PMCID: PMC11155392 DOI: 10.1038/s42256-024-00823-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 03/07/2024] [Indexed: 06/11/2024]
Abstract
The 5' UTR, a regulatory region at the beginning of an mRNA molecule, plays a crucial role in regulating the translation process and impacts the protein expression level. Language models have showcased their effectiveness in decoding the functions of protein and genome sequences. Here, we introduced a language model for 5' UTR, which we refer to as the UTR-LM. The UTR-LM is pre-trained on endogenous 5' UTRs from multiple species and is further augmented with supervised information including secondary structure and minimum free energy. We fine-tuned the UTR-LM in a variety of downstream tasks. The model outperformed the best known benchmark by up to 5% for predicting the Mean Ribosome Loading, and by up to 8% for predicting the Translation Efficiency and the mRNA Expression Level. The model also applies to identifying unannotated Internal Ribosome Entry Sites within the untranslated region and improves the AUPR from 0.37 to 0.52 compared to the best baseline. Further, we designed a library of 211 novel 5' UTRs with high predicted values of translation efficiency and evaluated them via a wet-lab assay. Experiment results confirmed that our top designs achieved a 32.5% increase in protein production level relative to well-established 5' UTR optimized for therapeutics.
Collapse
Affiliation(s)
- Yanyi Chu
- Center for Statistics and Machine Learning and Department of Electrical and Computer Engineering, Princeton University, Princeton, NJ 08544, USA
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Dan Yu
- RVAC Medicines, Waltham, MA 02451, USA
| | - Yupeng Li
- RVAC Medicines, Waltham, MA 02451, USA
| | - Kaixuan Huang
- Center for Statistics and Machine Learning and Department of Electrical and Computer Engineering, Princeton University, Princeton, NJ 08544, USA
| | - Yue Shen
- RVAC Medicines, Waltham, MA 02451, USA
| | - Le Cong
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | | | - Mengdi Wang
- Center for Statistics and Machine Learning and Department of Electrical and Computer Engineering, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
3
|
Zhang B, Zhang H, Wang Z, Cao H, Zhang N, Dai Z, Liang X, Peng Y, Wen J, Zhang X, Zhang L, Luo P, Zhang J, Liu Z, Cheng Q, Peng R. The regulatory role and clinical application prospects of circRNA in the occurrence and development of CNS tumors. CNS Neurosci Ther 2024; 30:e14500. [PMID: 37953502 PMCID: PMC11017455 DOI: 10.1111/cns.14500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 09/20/2023] [Accepted: 10/03/2023] [Indexed: 11/14/2023] Open
Abstract
BACKGROUND Central nervous system (CNS) tumors originate from the spinal cord or brain. The study showed that even with aggressive treatment, malignant CNS tumors have high mortality rates. However, CNS tumor risk factors and molecular mechanisms have not been verified. Due to the reasons mentioned above, diagnosis and treatment of CNS tumors in clinical practice are currently fraught with difficulties. Circular RNAs (circRNAs), single-stranded ncRNAs with covalently closed continuous structures, are essential to CNS tumor development. Growing evidence has proved the numeral critical biological functions of circRNAs for disease progression: sponging to miRNAs, regulating gene transcription and splicing, interacting with proteins, encoding proteins/peptides, and expressing in exosomes. AIMS This review aims to summarize current progress regarding the molecular mechanism of circRNA in CNS tumors and to explore the possibilities of clinical application based on circRNA in CNS tumors. METHODS We have summarized studies of circRNA in CNS tumors in Pubmed. RESULTS This review summarized their connection with CNS tumors and their functions, biogenesis, and biological properties. Furthermore, we introduced current advances in clinical RNA-related technologies. Then we discussed the diagnostic and therapeutic potential (especially for immunotherapy, chemotherapy, and radiotherapy) of circRNA in CNS tumors in the context of the recent advanced research and application of RNA in clinics. CONCLUSIONS CircRNA are increasingly proven to participate in decveloping CNS tumors. An in-depth study of the causal mechanisms of circRNAs in CNS tomor progression will ultimately advance their implementation in the clinic and developing new strategies for preventing and treating CNS tumors.
Collapse
Affiliation(s)
- Bo Zhang
- Department of Neurosurgery, Xiangya HospitalCentral South UniversityChangshaChina
- National Clinical Research Center for Geriatric Disorders, Xiangya HospitalCentral South UniversityChangshaChina
| | - Hao Zhang
- Department of Neurosurgery, Xiangya HospitalCentral South UniversityChangshaChina
- Department of Neurosurgery, The Second Affiliated HospitalChongqing Medical UniversityChongqingChina
| | - Zeyu Wang
- Department of Neurosurgery, Xiangya HospitalCentral South UniversityChangshaChina
- MRC Centre for Regenerative Medicine, Institute for Regeneration and RepairUniversity of EdinburghEdinburghUK
| | - Hui Cao
- Department of Psychiatry, The School of Clinical MedicineHunan University of Chinese MedicineChangshaChina
| | - Nan Zhang
- College of Life Science and TechnologyHuazhong University of Science and TechnologyWuhanChina
| | - Ziyu Dai
- Department of Neurosurgery, Xiangya HospitalCentral South UniversityChangshaChina
- National Clinical Research Center for Geriatric Disorders, Xiangya HospitalCentral South UniversityChangshaChina
| | - Xisong Liang
- Department of Neurosurgery, Xiangya HospitalCentral South UniversityChangshaChina
- National Clinical Research Center for Geriatric Disorders, Xiangya HospitalCentral South UniversityChangshaChina
| | - Yun Peng
- Teaching and Research Section of Clinical NursingXiangya Hospital of Central South UniversityChangshaChina
- Department of Geriatrics, Xiangya HospitalCentral South UniversityChangshaChina
| | - Jie Wen
- Department of Neurosurgery, Xiangya HospitalCentral South UniversityChangshaChina
- National Clinical Research Center for Geriatric Disorders, Xiangya HospitalCentral South UniversityChangshaChina
| | - Xun Zhang
- Department of Neurosurgery, Xiangya HospitalCentral South UniversityChangshaChina
- National Clinical Research Center for Geriatric Disorders, Xiangya HospitalCentral South UniversityChangshaChina
| | - Liyang Zhang
- Department of Neurosurgery, Xiangya HospitalCentral South UniversityChangshaChina
- National Clinical Research Center for Geriatric Disorders, Xiangya HospitalCentral South UniversityChangshaChina
| | - Peng Luo
- Department of Oncology, Zhujiang HospitalSouthern Medical UniversityGuangzhouChina
| | - Jian Zhang
- Department of Oncology, Zhujiang HospitalSouthern Medical UniversityGuangzhouChina
| | - Zaoqu Liu
- Department of Interventional RadiologyThe First Affiliated Hospital of Zhengzhou UniversityZhengzhouChina
| | - Quan Cheng
- Department of Neurosurgery, Xiangya HospitalCentral South UniversityChangshaChina
- National Clinical Research Center for Geriatric Disorders, Xiangya HospitalCentral South UniversityChangshaChina
| | - Renjun Peng
- Department of Neurosurgery, Xiangya HospitalCentral South UniversityChangshaChina
- National Clinical Research Center for Geriatric Disorders, Xiangya HospitalCentral South UniversityChangshaChina
| |
Collapse
|
4
|
Karlin DG. Parvovirus B19 and Human Parvovirus 4 Encode Similar Proteins in a Reading Frame Overlapping the VP1 Capsid Gene. Viruses 2024; 16:191. [PMID: 38399966 PMCID: PMC10891878 DOI: 10.3390/v16020191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 01/12/2024] [Accepted: 01/24/2024] [Indexed: 02/25/2024] Open
Abstract
Viruses frequently contain overlapping genes, which encode functionally unrelated proteins from the same DNA or RNA region but in different reading frames. Yet, overlapping genes are often overlooked during genome annotation, in particular in DNA viruses. Here we looked for the presence of overlapping genes likely to encode a functional protein in human parvovirus B19 (genus Erythroparvovirus), using an experimentally validated software, Synplot2. Synplot2 detected an open reading frame, X, conserved in all erythroparvoviruses, which overlaps the VP1 capsid gene and is under highly significant selection pressure. In a related virus, human parvovirus 4 (genus Tetraparvovirus), Synplot2 also detected an open reading frame under highly significant selection pressure, ARF1, which overlaps the VP1 gene and is conserved in all tetraparvoviruses. These findings provide compelling evidence that the X and ARF1 proteins must be expressed and functional. X and ARF1 have the exact same location (they overlap the region of the VP1 gene encoding the phospholipase A2 domain), are both in the same frame (+1) with respect to the VP1 frame, and encode proteins with similar predicted properties, including a central transmembrane region. Further studies will be needed to determine whether they have a common origin and similar function. X and ARF1 are probably translated either from a polycistronic mRNA by a non-canonical mechanism, or from an unmapped monocistronic mRNA. Finally, we also discovered proteins predicted to be expressed from a frame overlapping VP1 in other species related to parvovirus B19: porcine parvovirus 2 (Z protein) and bovine parvovirus 3 (X-like protein).
Collapse
Affiliation(s)
- David G. Karlin
- Division Phytomedicine, Thaer-Institute of Agricultural and Horticultural Sciences, Humboldt-Universität zu Berlin, Lentzeallee 55/57, D-14195 Berlin, Germany;
- Independent Researcher, 13000 Marseille, France
| |
Collapse
|
5
|
Dueñas Rey A, Del Pozo Valero M, Bouckaert M, Wood KA, Van den Broeck F, Daich Varela M, Thomas HB, Van Heetvelde M, De Bruyne M, Van de Sompele S, Bauwens M, Lenaerts H, Mahieu Q, Josifova D, Rivolta C, O'Keefe RT, Ellingford J, Webster AR, Arno G, Ayuso C, De Zaeytijd J, Leroy BP, De Baere E, Coppieters F. Combining a prioritization strategy and functional studies nominates 5'UTR variants underlying inherited retinal disease. Genome Med 2024; 16:7. [PMID: 38184646 PMCID: PMC10771650 DOI: 10.1186/s13073-023-01277-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 12/15/2023] [Indexed: 01/08/2024] Open
Abstract
BACKGROUND 5' untranslated regions (5'UTRs) are essential modulators of protein translation. Predicting the impact of 5'UTR variants is challenging and rarely performed in routine diagnostics. Here, we present a combined approach of a comprehensive prioritization strategy and functional assays to evaluate 5'UTR variation in two large cohorts of patients with inherited retinal diseases (IRDs). METHODS We performed an isoform-level re-analysis of retinal RNA-seq data to identify the protein-coding transcripts of 378 IRD genes with highest expression in retina. We evaluated the coverage of their 5'UTRs by different whole exome sequencing (WES) kits. The selected 5'UTRs were analyzed in whole genome sequencing (WGS) and WES data from IRD sub-cohorts from the 100,000 Genomes Project (n = 2397 WGS) and an in-house database (n = 1682 WES), respectively. Identified variants were annotated for 5'UTR-relevant features and classified into seven categories based on their predicted functional consequence. We developed a variant prioritization strategy by integrating population frequency, specific criteria for each category, and family and phenotypic data. A selection of candidate variants underwent functional validation using diverse approaches. RESULTS Isoform-level re-quantification of retinal gene expression revealed 76 IRD genes with a non-canonical retina-enriched isoform, of which 20 display a fully distinct 5'UTR compared to that of their canonical isoform. Depending on the probe design, 3-20% of IRD genes have 5'UTRs fully captured by WES. After analyzing these regions in both cohorts, we prioritized 11 (likely) pathogenic variants in 10 genes (ARL3, MERTK, NDP, NMNAT1, NPHP4, PAX6, PRPF31, PRPF4, RDH12, RD3), of which 7 were novel. Functional analyses further supported the pathogenicity of three variants. Mis-splicing was demonstrated for the PRPF31:c.-9+1G>T variant. The MERTK:c.-125G>A variant, overlapping a transcriptional start site, was shown to significantly reduce both luciferase mRNA levels and activity. The RDH12:c.-123C>T variant was found in cis with the hypomorphic RDH12:c.701G>A (p.Arg234His) variant in 11 patients. This 5'UTR variant, predicted to introduce an upstream open reading frame, was shown to result in reduced RDH12 protein but unaltered mRNA levels. CONCLUSIONS This study demonstrates the importance of 5'UTR variants implicated in IRDs and provides a systematic approach for 5'UTR annotation and validation that is applicable to other inherited diseases.
Collapse
Affiliation(s)
- Alfredo Dueñas Rey
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | - Marta Del Pozo Valero
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
- Department of Genetics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz, University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), Madrid, Spain
| | - Manon Bouckaert
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | - Katherine A Wood
- Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicines and Health, University of Manchester, Manchester, UK
| | - Filip Van den Broeck
- Department of Ophthalmology, Ghent University Hospital, Ghent, Belgium
- Department of Head & Skin, Ghent University, Ghent, Belgium
| | - Malena Daich Varela
- UCL Institute of Ophthalmology, University College London, London, UK
- Moorfields Eye Hospital, London, UK
| | - Huw B Thomas
- Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicines and Health, University of Manchester, Manchester, UK
| | - Mattias Van Heetvelde
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | - Marieke De Bruyne
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | - Stijn Van de Sompele
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | - Miriam Bauwens
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | - Hanne Lenaerts
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | - Quinten Mahieu
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | | | - Carlo Rivolta
- Department of Ophthalmology, University of Basel, Basel, Switzerland
- Institute of Molecular and Clinical Ophthalmology Basel (IOB), Basel, Switzerland
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | - Raymond T O'Keefe
- Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicines and Health, University of Manchester, Manchester, UK
| | - Jamie Ellingford
- Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicines and Health, University of Manchester, Manchester, UK
- Genomics England, London, UK
- Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Manchester, UK
| | - Andrew R Webster
- UCL Institute of Ophthalmology, University College London, London, UK
- Moorfields Eye Hospital, London, UK
| | - Gavin Arno
- UCL Institute of Ophthalmology, University College London, London, UK
- Moorfields Eye Hospital, London, UK
| | - Carmen Ayuso
- Department of Genetics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz, University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, Madrid, Spain
| | - Julie De Zaeytijd
- Department of Ophthalmology, Ghent University Hospital, Ghent, Belgium
- Department of Head & Skin, Ghent University, Ghent, Belgium
| | - Bart P Leroy
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Ophthalmology, Ghent University Hospital, Ghent, Belgium
- Department of Head & Skin, Ghent University, Ghent, Belgium
- Division of Ophthalmology, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Elfride De Baere
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | - Frauke Coppieters
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium.
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium.
- Department of Pharmaceutics, Ghent University, Ghent, Belgium.
| |
Collapse
|
6
|
Silonov SA, Smirnov EY, Kuznetsova IM, Turoverov KK, Fonin AV. PML Body Biogenesis: A Delicate Balance of Interactions. Int J Mol Sci 2023; 24:16702. [PMID: 38069029 PMCID: PMC10705990 DOI: 10.3390/ijms242316702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Revised: 11/21/2023] [Accepted: 11/22/2023] [Indexed: 12/18/2023] Open
Abstract
PML bodies are subnuclear protein complexes that play a crucial role in various physiological and pathological cellular processes. One of the general structural proteins of PML bodies is a member of the tripartite motif (TRIM) family-promyelocytic leukemia protein (PML). It is known that PML interacts with over a hundred partners, and the protein itself is represented by several major isoforms, differing in their variable and disordered C-terminal end due to alternative splicing. Despite nearly 30 years of research, the mechanisms underlying PML body formation and the role of PML proteins in this process remain largely unclear. In this review, we examine the literature and highlight recent progress in this field, with a particular focus on understanding the role of individual domains of the PML protein, its post-translational modifications, and polyvalent nonspecific interactions in the formation of PML bodies. Additionally, based on the available literature, we propose a new hypothetical model of PML body formation.
Collapse
Affiliation(s)
- Sergey A. Silonov
- Laboratory of Structural Dynamics, Stability and Folding of Proteins, Institute of Cytology, Russian Academy of Sciences, St. Petersburg 194064, Russia; (E.Y.S.); (I.M.K.); (K.K.T.)
| | | | | | | | - Alexander V. Fonin
- Laboratory of Structural Dynamics, Stability and Folding of Proteins, Institute of Cytology, Russian Academy of Sciences, St. Petersburg 194064, Russia; (E.Y.S.); (I.M.K.); (K.K.T.)
| |
Collapse
|
7
|
Lin HH, Chang CY, Huang YR, Shen CH, Wu YC, Chang KL, Lee YC, Lin YC, Ting WC, Chien HJ, Zheng YF, Lai CC, Hsiao KY. Exon Junction Complex Mediates the Cap-Independent Translation of Circular RNA. Mol Cancer Res 2023; 21:1220-1233. [PMID: 37527157 DOI: 10.1158/1541-7786.mcr-22-0877] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 06/22/2023] [Accepted: 07/25/2023] [Indexed: 08/03/2023]
Abstract
Evidence that circular RNAs (circRNA) serve as protein template is accumulating. However, how the cap-independent translation is controlled remains largely uncharacterized. Here, we show that the presence of intron and thus splicing promote cap-independent translation. By acquiring the exon junction complex (EJC) after splicing, the interaction between circRNA and ribosomes was promoted, thereby facilitating translation. Prevention of splicing by treatment with spliceosome inhibitor or mutating splicing signal hindered cap-independent translation of circRNA. Moreover, EJC-tethering using Cas13 technology reconstituted EJC-dependent circRNA translation. Finally, the level of a coding circRNA from succinate dehydrogenase assembly factor 2 (circSDHAF2) was found to be elevated in the tumorous tissues from patients with colorectal cancer, and shown to be critical in tumorigenesis of colorectal cancer in both cell and murine models. These findings reveal that EJC-dependent control of circSDHAF2 translation is involved in the regulation of oncogenic pathways. IMPLICATIONS EJC-mediated cap-independent translation of circRNA is implicated in the tumorigenesis of colorectal cancer.
Collapse
Affiliation(s)
- Hui-Hsuan Lin
- Doctoral Program in Tissue Engineering and Regenerative Medicine, National Chung Hsing University, Taichung, Taiwan
- Institute of Biochemistry, College of Life Sciences, National Chung Hsing University, Taichung, Taiwan
| | - Chiu-Yuan Chang
- Institute of Biochemistry, College of Life Sciences, National Chung Hsing University, Taichung, Taiwan
| | - Yi-Ren Huang
- Institute of Biochemistry, College of Life Sciences, National Chung Hsing University, Taichung, Taiwan
| | - Che-Hung Shen
- Doctoral Program in Tissue Engineering and Regenerative Medicine, National Chung Hsing University, Taichung, Taiwan
- National Institute of Cancer Research, National Health Research Institutes, Tainan, Taiwan
| | - Yu-Chen Wu
- Institute of Biochemistry, College of Life Sciences, National Chung Hsing University, Taichung, Taiwan
| | - Kai-Li Chang
- Department of Physiology, National Cheng Kung University, Tainan, Taiwan
| | - Yueh-Chun Lee
- Department of Radiation Oncology, Chung Shan Medical University Hospital, Taichung, Taiwan
- School of Medicine, Chung Shan Medical University, Taichung, Taiwan
| | - Ya-Chi Lin
- Department of Plant Pathology, College of Agriculture and Natural Resources, National Chung Hsing University, Taichung, Taiwan
- Department of Medical Laboratory Science and Biotechnology, Asia University, Taichung, Taiwan
| | - Wen-Chien Ting
- Division of Colorectal Surgery, Department of Surgery, Chung Shan Medical University Hospital, Taichung, Taiwan
| | - Han-Ju Chien
- Department of Biochemical Science and Technology, National Chiayi University, Chiayi, Taiwan
| | - Yi-Feng Zheng
- Institute of Molecular Biology, College of Life Sciences, National Chung Hsing University, Taichung, Taiwan
| | - Chien-Chen Lai
- Institute of Molecular Biology, College of Life Sciences, National Chung Hsing University, Taichung, Taiwan
| | - Kuei-Yang Hsiao
- Doctoral Program in Tissue Engineering and Regenerative Medicine, National Chung Hsing University, Taichung, Taiwan
- Institute of Biochemistry, College of Life Sciences, National Chung Hsing University, Taichung, Taiwan
- Doctoral Program in Translational Medicine, College of Life Sciences, National Chung Hsing University, Taichung
- Rong Hsing Research Center for Translational Medicine, College of Life Sciences, National Chung Hsing University, Taichung
| |
Collapse
|
8
|
Arendt-Tranholm A, Mwirigi JM, Price TJ. RNA isoform expression landscape of the human dorsal root ganglion (DRG) generated from long read sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.28.564535. [PMID: 37961262 PMCID: PMC10634934 DOI: 10.1101/2023.10.28.564535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Splicing is a post-transcriptional RNA processing mechanism that enhances genomic complexity by creating multiple isoforms from the same gene. Diversity in splicing in the mammalian nervous system is associated with neuronal development, synaptic function and plasticity, and is also associated with diseases of the nervous system ranging from neurodegeneration to chronic pain. We aimed to characterize the isoforms expressed in the human peripheral nervous system, with the goal of creating a resource to identify novel isoforms of functionally relevant genes associated with somatosensation and nociception. We used long read sequencing (LRS) to document isoform expression in the human dorsal root ganglia (hDRG) from 3 organ donors. Isoforms were validated in silico by confirming expression in hDRG short read sequencing (SRS) data from 3 independent organ donors. 19,547 isoforms of protein-coding genes were detected using LRS and validated with SRS and strict expression cutoffs. We identified 763 isoforms with at least one previously undescribed splice-junction. Previously unannotated isoforms of multiple pain-associated genes, including ASIC3, MRGPRX1 and HNRNPK were identified. In the novel isoforms of ASIC3, a region comprising ~35% of the 5'UTR was excised. In contrast, a novel splice-junction was utilized in isoforms of MRGPRX1 to include an additional exon upstream of the start-codon, consequently adding a region to the 5'UTR. Novel isoforms of HNRNPK were identified which utilized previously unannotated splice-sites to both excise exon 14 and include a sequence in the 5' end of exon 13. The insertion and deletion in the coding region was predicted to excise a serine-phosphorylation site favored by cdc2, and replace it with a tyrosine-phosphorylation site potentially phosphorylated by SRC. We also independently confirm a recently reported DRG-specific splicing event in WNK1 that gives insight into how painless peripheral neuropathy occurs when this gene is mutated. Our findings give a clear overview of mRNA isoform diversity in the hDRG obtained using LRS. Using this work as a foundation, an important next step will be to use LRS on hDRG tissues recovered from people with a history of chronic pain. This should enable identification of new drug targets and a better understanding of chronic pain that may involve aberrant splicing events.
Collapse
Affiliation(s)
- Asta Arendt-Tranholm
- School of Behavioral and Brain Sciences, Department of Neuroscience and Center for Advanced Pain Studies, The University of Texas at Dallas, 800 W Campbell Rd, Richardson, Texas, 75080
| | - Juliet M. Mwirigi
- School of Behavioral and Brain Sciences, Department of Neuroscience and Center for Advanced Pain Studies, The University of Texas at Dallas, 800 W Campbell Rd, Richardson, Texas, 75080
| | - Theodore J. Price
- School of Behavioral and Brain Sciences, Department of Neuroscience and Center for Advanced Pain Studies, The University of Texas at Dallas, 800 W Campbell Rd, Richardson, Texas, 75080
| |
Collapse
|
9
|
Kontos CK, Hadjichambi D, Papatsirou M, Karousi P, Christodoulou S, Sideris DC, Scorilas A. Discovery and Comprehensive Characterization of Novel Circular RNAs of the Apoptosis-Related BOK Gene in Human Ovarian and Prostate Cancer Cells, Using Nanopore Sequencing. Noncoding RNA 2023; 9:57. [PMID: 37888203 PMCID: PMC10609399 DOI: 10.3390/ncrna9050057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 09/14/2023] [Accepted: 09/22/2023] [Indexed: 10/28/2023] Open
Abstract
CircRNAs have become a novel scientific research hotspot, and an increasing number of studies have shed light on their involvement in malignant progression. Prompted by the apparent scientific gap in circRNAs from apoptosis-related genes, such as BOK, we focused on the identification of novel BOK circRNAs in human ovarian and prostate cancer cells. Total RNA was extracted from ovarian and prostate cancer cell lines and reversely transcribed using random hexamer primers. A series of PCR assays utilizing gene-specific divergent primers were carried out. Next, third-generation sequencing based on nanopore technology followed by extensive bioinformatics analysis led to the discovery of 23 novel circRNAs. These novel circRNAs consist of both exonic and intronic regions of the BOK gene. Interestingly, the exons that form the back-splice junction were truncated in most circRNAs, and multiple back-splice sites were found for each BOK exon. Moreover, several BOK circRNAs are predicted to sponge microRNAs with a key role in reproductive cancers, while the presence of putative open reading frames indicates their translational potential. Overall, this study suggests that distinct alternative splicing events lead to the production of novel BOK circRNAs, which could come into play in the molecular landscape and clinical investigation of ovarian and prostate cancer.
Collapse
Affiliation(s)
- Christos K. Kontos
- Department of Biochemistry and Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, 15701 Athens, Greece; (D.H.); (M.P.); (P.K.); (D.C.S.); (A.S.)
| | - Despina Hadjichambi
- Department of Biochemistry and Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, 15701 Athens, Greece; (D.H.); (M.P.); (P.K.); (D.C.S.); (A.S.)
| | - Maria Papatsirou
- Department of Biochemistry and Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, 15701 Athens, Greece; (D.H.); (M.P.); (P.K.); (D.C.S.); (A.S.)
| | - Paraskevi Karousi
- Department of Biochemistry and Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, 15701 Athens, Greece; (D.H.); (M.P.); (P.K.); (D.C.S.); (A.S.)
| | - Spyridon Christodoulou
- Fourth Department of Surgery, University General Hospital “Attikon”, National and Kapodistrian University of Athens, 12462 Athens, Greece;
| | - Diamantis C. Sideris
- Department of Biochemistry and Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, 15701 Athens, Greece; (D.H.); (M.P.); (P.K.); (D.C.S.); (A.S.)
| | - Andreas Scorilas
- Department of Biochemistry and Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, 15701 Athens, Greece; (D.H.); (M.P.); (P.K.); (D.C.S.); (A.S.)
| |
Collapse
|
10
|
Zhou Y, Wu J, Yao S, Xu Y, Zhao W, Tong Y, Zhou Z. DeepCIP: A multimodal deep learning method for the prediction of internal ribosome entry sites of circRNAs. Comput Biol Med 2023; 164:107288. [PMID: 37542919 DOI: 10.1016/j.compbiomed.2023.107288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 07/05/2023] [Accepted: 07/28/2023] [Indexed: 08/07/2023]
Abstract
Circular RNAs (circRNAs) have been found to have the ability to encode proteins through internal ribosome entry sites (IRESs), which are essential RNA regulatory elements for cap-independent translation. Identification of IRES elements in circRNA is crucial for understanding its function. Previous studies have presented IRES predictors based on machine learning techniques, but they were mainly designed for linear RNA IRES. In this study, we proposed DeepCIP (Deep learning method for CircRNA IRES Prediction), a multimodal deep learning approach that employs both sequence and structural information for circRNA IRES prediction. Our results demonstrate the effectiveness of the sequence and structure models used by DeepCIP in feature extraction and suggest that integrating sequence and structural information efficiently improves the accuracy of prediction. The comparison studies indicate that DeepCIP outperforms other comparative methods on the test set and real circRNA IRES dataset. Furthermore, through the integration of an interpretable analysis mechanism, we elucidate the sequence patterns learned by our model, which align with the previous discovery of motifs that facilitate circRNA translation. Thus, DeepCIP has the potential to enhance the study of the coding potential of circRNAs and contribute to the design of circRNA-based drugs. DeepCIP as a standalone program is freely available at https://github.org/zjupgx/DeepCIP.
Collapse
Affiliation(s)
- Yuxuan Zhou
- Innovation Institute for Artificial Intelligence in Medicine and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China; Zhejiang University Innovation Institute for Artificial Intelligence in Medicine - Aoming (Hangzhou) Biomedical Co., Ltd. Joint Laboratory, Hangzhou, 310018, China
| | - Jingcheng Wu
- Innovation Institute for Artificial Intelligence in Medicine and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Shihao Yao
- College of Life Sciences, China Jiliang University, Hangzhou, 310018, China; China Jiliang University - Aoming (Hangzhou) Biomedical Co., Ltd. Joint Laboratory, Hangzhou, 310018, China
| | - Yulian Xu
- College of Life Sciences, China Jiliang University, Hangzhou, 310018, China; China Jiliang University - Aoming (Hangzhou) Biomedical Co., Ltd. Joint Laboratory, Hangzhou, 310018, China
| | - Wenbin Zhao
- Innovation Institute for Artificial Intelligence in Medicine and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China; Zhejiang University Innovation Institute for Artificial Intelligence in Medicine - Aoming (Hangzhou) Biomedical Co., Ltd. Joint Laboratory, Hangzhou, 310018, China
| | - Yunguang Tong
- Innovation Institute for Artificial Intelligence in Medicine and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China; College of Life Sciences, China Jiliang University, Hangzhou, 310018, China; Aoming (Hangzhou) Biomedical Co., Ltd., Hangzhou, 310018, China; Zhejiang University Innovation Institute for Artificial Intelligence in Medicine - Aoming (Hangzhou) Biomedical Co., Ltd. Joint Laboratory, Hangzhou, 310018, China; China Jiliang University - Aoming (Hangzhou) Biomedical Co., Ltd. Joint Laboratory, Hangzhou, 310018, China.
| | - Zhan Zhou
- Innovation Institute for Artificial Intelligence in Medicine and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China; The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, 322000, China.
| |
Collapse
|
11
|
Wang Z, Cui Q, Su C, Zhao S, Wang R, Wang Z, Meng J, Luan Y. Unveiling the secrets of non-coding RNA-encoded peptides in plants: A comprehensive review of mining methods and research progress. Int J Biol Macromol 2023:124952. [PMID: 37257526 DOI: 10.1016/j.ijbiomac.2023.124952] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 05/15/2023] [Accepted: 05/16/2023] [Indexed: 06/02/2023]
Abstract
Non-coding RNAs (ncRNAs) are not conventionally involved in protein encoding. However, recent findings indicate that ncRNAs possess the capacity to code for proteins or peptides. These ncRNA-encoded peptides (ncPEPs) are vital for diverse plant life processes and exhibit significant potential value. Despite their importance, research on plant ncPEPs is limited, with only a few studies conducted and less information on the underlying mechanisms, and the field remains in its nascent stage. This manuscript provides a comprehensive overview of ncPEPs mining methods in plants, focusing on prediction, identification, and functional analysis. We discuss the strengths and weaknesses of various techniques, identify future research directions in the ncPEPs domain, and elucidate the biological functions and agricultural application prospects of plant ncPEPs. By highlighting the immense potential and research value of ncPEPs, we aim to lay a solid foundation for more in-depth studies in plant science.
Collapse
Affiliation(s)
- Zhengjie Wang
- School of Bioengineering, Dalian University of Technology, Dalian 116024, China
| | - Qi Cui
- School of Bioengineering, Dalian University of Technology, Dalian 116024, China
| | - Chenglin Su
- School of Bioengineering, Dalian University of Technology, Dalian 116024, China
| | - Siyuan Zhao
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Ruiming Wang
- School of Bioengineering, Dalian University of Technology, Dalian 116024, China
| | - Zhicheng Wang
- School of Bioengineering, Dalian University of Technology, Dalian 116024, China
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian 116024, China.
| |
Collapse
|
12
|
Di Stefano V, Prinzi F, Luigetti M, Russo M, Tozza S, Alonge P, Romano A, Sciarrone MA, Vitali F, Mazzeo A, Gentile L, Palumbo G, Manganelli F, Vitabile S, Brighina F. Machine Learning for Early Diagnosis of ATTRv Amyloidosis in Non-Endemic Areas: A Multicenter Study from Italy. Brain Sci 2023; 13:brainsci13050805. [PMID: 37239276 DOI: 10.3390/brainsci13050805] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 05/12/2023] [Accepted: 05/14/2023] [Indexed: 05/28/2023] Open
Abstract
BACKGROUND Hereditary transthyretin amyloidosis with polyneuropathy (ATTRv) is an adult-onset multisystemic disease, affecting the peripheral nerves, heart, gastrointestinal tract, eyes, and kidneys. Nowadays, several treatment options are available; thus, avoiding misdiagnosis is crucial to starting therapy in early disease stages. However, clinical diagnosis may be difficult, as the disease may present with unspecific symptoms and signs. We hypothesize that the diagnostic process may benefit from the use of machine learning (ML). METHODS 397 patients referring to neuromuscular clinics in 4 centers from the south of Italy with neuropathy and at least 1 more red flag, as well as undergoing genetic testing for ATTRv, were considered. Then, only probands were considered for analysis. Hence, a cohort of 184 patients, 93 with positive and 91 (age- and sex-matched) with negative genetics, was considered for the classification task. The XGBoost (XGB) algorithm was trained to classify positive and negative TTR mutation patients. The SHAP method was used as an explainable artificial intelligence algorithm to interpret the model findings. RESULTS diabetes, gender, unexplained weight loss, cardiomyopathy, bilateral carpal tunnel syndrome (CTS), ocular symptoms, autonomic symptoms, ataxia, renal dysfunction, lumbar canal stenosis, and history of autoimmunity were used for the model training. The XGB model showed an accuracy of 0.707 ± 0.101, a sensitivity of 0.712 ± 0.147, a specificity of 0.704 ± 0.150, and an AUC-ROC of 0.752 ± 0.107. Using the SHAP explanation, it was confirmed that unexplained weight loss, gastrointestinal symptoms, and cardiomyopathy showed a significant association with the genetic diagnosis of ATTRv, while bilateral CTS, diabetes, autoimmunity, and ocular and renal involvement were associated with a negative genetic test. CONCLUSIONS Our data show that ML might potentially be a useful instrument to identify patients with neuropathy that should undergo genetic testing for ATTRv. Unexplained weight loss and cardiomyopathy are relevant red flags in ATTRv in the south of Italy. Further studies are needed to confirm these findings.
Collapse
Affiliation(s)
- Vincenzo Di Stefano
- Department of Biomedicine, Neuroscience and Advanced Diagnostics (BiND), University of Palermo, 90127 Palermo, Italy
| | - Francesco Prinzi
- Department of Biomedicine, Neuroscience and Advanced Diagnostics (BiND), University of Palermo, 90127 Palermo, Italy
| | - Marco Luigetti
- Fondazione Policlinico Universitario A, Gemelli-IRCCS, UOC Neurologia, 00168 Rome, Italy
- Department of Neurosciences, Università Cattolica del Sacro Cuore, 00168 Rome, Italy
| | - Massimo Russo
- Department of Clinical and Experimental Medicine, University of Messina, 98182 Messina, Italy
| | - Stefano Tozza
- Department of Neuroscience, Reproductive and Odontostomatological Science, University of Naples "Federico II", 80131 Naples, Italy
| | - Paolo Alonge
- Department of Biomedicine, Neuroscience and Advanced Diagnostics (BiND), University of Palermo, 90127 Palermo, Italy
| | - Angela Romano
- Fondazione Policlinico Universitario A, Gemelli-IRCCS, UOC Neurologia, 00168 Rome, Italy
- Department of Neurosciences, Università Cattolica del Sacro Cuore, 00168 Rome, Italy
| | - Maria Ausilia Sciarrone
- Fondazione Policlinico Universitario A, Gemelli-IRCCS, UOC Neurologia, 00168 Rome, Italy
- Department of Neurosciences, Università Cattolica del Sacro Cuore, 00168 Rome, Italy
| | - Francesca Vitali
- Fondazione Policlinico Universitario A, Gemelli-IRCCS, UOC Neurologia, 00168 Rome, Italy
- Department of Neurosciences, Università Cattolica del Sacro Cuore, 00168 Rome, Italy
| | - Anna Mazzeo
- Department of Clinical and Experimental Medicine, University of Messina, 98182 Messina, Italy
| | - Luca Gentile
- Department of Clinical and Experimental Medicine, University of Messina, 98182 Messina, Italy
| | - Giovanni Palumbo
- Department of Neuroscience, Reproductive and Odontostomatological Science, University of Naples "Federico II", 80131 Naples, Italy
| | - Fiore Manganelli
- Department of Neuroscience, Reproductive and Odontostomatological Science, University of Naples "Federico II", 80131 Naples, Italy
| | - Salvatore Vitabile
- Department of Biomedicine, Neuroscience and Advanced Diagnostics (BiND), University of Palermo, 90127 Palermo, Italy
| | - Filippo Brighina
- Department of Biomedicine, Neuroscience and Advanced Diagnostics (BiND), University of Palermo, 90127 Palermo, Italy
| |
Collapse
|
13
|
Sparks ME, Wang YM, Shi J, Harrison RL. Lymantria Dispar Iflavirus 1 RNA Comprises a Large Proportion of RNA in Adult L. dispar Moths. INSECTS 2023; 14:insects14050466. [PMID: 37233094 DOI: 10.3390/insects14050466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 04/27/2023] [Accepted: 05/12/2023] [Indexed: 05/27/2023]
Abstract
The spongy moth virus Lymantria dispar iflavirus 1 (LdIV1), originally identified from a Lymantria dispar cell line, was detected in 24 RNA samples from female moths of four populations from the USA and China. Genome-length contigs were assembled for each population and compared with the reference genomes of the first reported LdIV1 genome (Ames strain) and two LdIV1 sequences available in GenBank originating from Novosibirsk, the Russian Federation. A whole-genome phylogeny was generated for these sequences, indicating that LdIV1 viruses observed in North American (flightless) and Asian (flighted) spongy moth lineages indeed partition into clades as would be expected per their host's geographic origin and biotype. A comprehensive listing of synonymous and non-synonymous mutations, as well as indels, among the polyprotein coding sequences of these seven LdIV1 variants was compiled and a codon-level phylogram was computed using polyprotein sequences of these, and 50 additional iflaviruses placed LdIV1 in a large clade consisting mostly of iflaviruses from other species of Lepidoptera. Of special note, LdIV1 RNA was present at very high levels in all samples, with LdIV1 reads accounting for a mean average of 36.41% (ranging from 1.84% to 68.75%, with a standard deviation of 20.91) of the total sequenced volume.
Collapse
Affiliation(s)
- Michael E Sparks
- Invasive Insect Biocontrol and Behavior Laboratory, USDA-ARS, Beltsville, MD 20705, USA
| | - Yi-Ming Wang
- Sino-France Joint Laboratory for Invasive Forest Pests in Eurasia, Beijing Forestry University, Beijing 100083, China
| | - Juan Shi
- Sino-France Joint Laboratory for Invasive Forest Pests in Eurasia, Beijing Forestry University, Beijing 100083, China
| | - Robert L Harrison
- Invasive Insect Biocontrol and Behavior Laboratory, USDA-ARS, Beltsville, MD 20705, USA
| |
Collapse
|
14
|
Crudele F, Bianchi N, Terrazzan A, Ancona P, Frassoldati A, Gasparini P, D'Adamo AP, Papaioannou D, Garzon R, Wójcicka A, Gaj P, Jażdżewski K, Palatini J, Volinia S. Circular RNAs Could Encode Unique Proteins and Affect Cancer Pathways. BIOLOGY 2023; 12:biology12040493. [PMID: 37106694 PMCID: PMC10135897 DOI: 10.3390/biology12040493] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 03/10/2023] [Accepted: 03/21/2023] [Indexed: 04/29/2023]
Abstract
circRNAs constitute a novel class of RNA, generally considered as non-coding RNAs; nonetheless, their coding potential has been under scrutiny. In this work, we systematically explored the predicted proteins of more than 160,000 circRNAs detected by exome capture RNA-sequencing and collected in the MiOncoCirc pan-cancer compendium, including normal and cancer samples from different types of tissues. For the functional evaluation, we compared their primary structure and domain composition with those derived from the same linear mRNAs. Among the 4362 circRNAs potentially encoding proteins with a unique primary structure and 1179 encoding proteins with a novel domain composition, 183 were differentially expressed in cancer. In particular, eight were associated with prognosis in acute myeloid leukemia. The functional classification of the dysregulated circRNA-encoded polypeptides showed an enrichment in the heme and cancer signaling, DNA-binding, and phosphorylation processes, and disclosed the roles of some circRNA-based effectors in cancer.
Collapse
Affiliation(s)
- Francesca Crudele
- Department of Translational Medicine, University of Ferrara, 44121 Ferrara, Italy
- Genetics Unit, Institute for Maternal and Child Health, Scientific Institute for Research, Hospitalization and Healthcare (IRCCS) Burlo Garofolo, 34137 Trieste, Italy
| | - Nicoletta Bianchi
- Department of Translational Medicine, University of Ferrara, 44121 Ferrara, Italy
| | - Anna Terrazzan
- Department of Translational Medicine, University of Ferrara, 44121 Ferrara, Italy
- Laboratory for Advanced Therapy Technologies (LTTA), University of Ferrara, 44121 Ferrara, Italy
| | - Pietro Ancona
- Department of Translational Medicine, University of Ferrara, 44121 Ferrara, Italy
| | - Antonio Frassoldati
- Department of Oncology, Azienda Ospedaliero-Universitaria St. Anna di Ferrara, 44124 Ferrara, Italy
| | - Paolo Gasparini
- Genetics Unit, Institute for Maternal and Child Health, Scientific Institute for Research, Hospitalization and Healthcare (IRCCS) Burlo Garofolo, 34137 Trieste, Italy
| | - Adamo P D'Adamo
- Genetics Unit, Institute for Maternal and Child Health, Scientific Institute for Research, Hospitalization and Healthcare (IRCCS) Burlo Garofolo, 34137 Trieste, Italy
| | - Dimitrios Papaioannou
- Laura and Isaac Perlmutter Cancer Center, New York University School of Medicine, NYU Langone Health, New York, NY 10016, USA
| | - Ramiro Garzon
- Division of Hematology and Hematological Malignancies, University of Utah, Salt Lake City, UT 84112, USA
| | | | - Paweł Gaj
- Warsaw Genomics INC, 01-682 Warszawa, Poland
| | - Krystian Jażdżewski
- Human Cancer Genetics, Biological and Chemical Research Centre, University of Warsaw, 02-089 Warsaw, Poland
| | - Jeffrey Palatini
- Genomics Core Facility, Centre of New Technologies, University of Warsaw, 02-097 Warsaw, Poland
| | - Stefano Volinia
- Department of Translational Medicine, University of Ferrara, 44121 Ferrara, Italy
- Laboratory for Advanced Therapy Technologies (LTTA), University of Ferrara, 44121 Ferrara, Italy
- CNBCh, Biological and Chemical Research Centre, University of Warsaw, 02-089 Warsaw, Poland
| |
Collapse
|
15
|
Liu Y, Liu Y, Wang S, Zhu X. LBCE-XGB: A XGBoost Model for Predicting Linear B-Cell Epitopes Based on BERT Embeddings. Interdiscip Sci 2023; 15:293-305. [PMID: 36646842 DOI: 10.1007/s12539-023-00549-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 12/28/2022] [Accepted: 01/03/2023] [Indexed: 01/18/2023]
Abstract
Accurately detecting linear B-cell epitopes (BCEs) makes great sense in vaccine design, immunodiagnostic test, antibody production, disease prevention and treatment. Wet-lab experiments for determining linear BCEs are both expensive and laborious, which are not able to meet the recognition needs of modern massive protein sequence data. Instead, computational methods can efficiently identify linear BCEs with low cost. Although several computational methods are available, the performance is still not satisfactory. Thus, we propose a new method, LBCE-XGB, to forecast linear BCEs based on XGBoost algorithm. To represent the biological information concealed in peptide sequences, the embeddings of the residues were obtained from a pre-trained domain-specific BERT model. In addition, the other five types of attributes comprising amino acid composition, amino acid antigenicity scale were also extracted. The best feature combination was determined according to the cross-validation results. Against the models developed by other deep learning and machine learning algorithms, LBCE-XGB achieves the top performance with an AUROC of 0.845 for fivefold cross-validation. The results on the independent test set show that our model attains an AUROC of 0.838 which is substantially higher than other state-of-the-art methods. The outcomes indicate that the representations of BERT could be an effective feature in predicting linear BCEs and we believe that LBCE-XGB could be a useful medium for detecting linear B cell epitopes with high accuracy and low cost.
Collapse
Affiliation(s)
- Yufeng Liu
- School of Sciences, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Yinbo Liu
- School of Sciences, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Shuyu Wang
- School of Sciences, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Xiaolei Zhu
- School of Sciences, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| |
Collapse
|
16
|
Sokhansanj BA, Zhao Z, Rosen GL. Interpretable and Predictive Deep Neural Network Modeling of the SARS-CoV-2 Spike Protein Sequence to Predict COVID-19 Disease Severity. BIOLOGY 2022; 11:1786. [PMID: 36552295 PMCID: PMC9774807 DOI: 10.3390/biology11121786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 11/28/2022] [Accepted: 12/05/2022] [Indexed: 12/13/2022]
Abstract
Through the COVID-19 pandemic, SARS-CoV-2 has gained and lost multiple mutations in novel or unexpected combinations. Predicting how complex mutations affect COVID-19 disease severity is critical in planning public health responses as the virus continues to evolve. This paper presents a novel computational framework to complement conventional lineage classification and applies it to predict the severe disease potential of viral genetic variation. The transformer-based neural network model architecture has additional layers that provide sample embeddings and sequence-wide attention for interpretation and visualization. First, training a model to predict SARS-CoV-2 taxonomy validates the architecture's interpretability. Second, an interpretable predictive model of disease severity is trained on spike protein sequence and patient metadata from GISAID. Confounding effects of changing patient demographics, increasing vaccination rates, and improving treatment over time are addressed by including demographics and case date as independent input to the neural network model. The resulting model can be interpreted to identify potentially significant virus mutations and proves to be a robust predctive tool. Although trained on sequence data obtained entirely before the availability of empirical data for Omicron, the model can predict the Omicron's reduced risk of severe disease, in accord with epidemiological and experimental data.
Collapse
Affiliation(s)
- Bahrad A. Sokhansanj
- Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Department of Electrical & Computer Engineering, College of Engineering, Drexel University, Philadelphia, PA 19104, USA
| | | | | |
Collapse
|
17
|
Shen T, Liu D, Lin Z, Ren C, Zhao W, Gao W. A Machine Learning Model to Predict Cardiovascular Events during Exercise Evaluation in Patients with Coronary Heart Disease. J Clin Med 2022; 11:jcm11206061. [PMID: 36294382 PMCID: PMC9605581 DOI: 10.3390/jcm11206061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 10/02/2022] [Accepted: 10/12/2022] [Indexed: 01/24/2023] Open
Abstract
OBJECTIVE To develop and optimize a machine learning prediction model for cardiovascular events during exercise evaluation in patients with coronary heart disease (CHD). METHODS 16,645 cases of cardiopulmonary exercise testing (CPET) conducted in patients with CHD from January 2016 to September 2019 were retrospectively included. Clinical data before testing and data during exercise were collected and analyzed. RESULTS Cardiovascular events occurred during 505 CPETs (3.0%). No death was reported. Predictive accuracy of the model was evaluated by area under the curve (AUC). AUCs for the SVM, logistic regression, GBDT and XGBoost were 0.686, 0.778, 0.784, and 0.794 respectively. CONCLUSIONS Machine learning methods (especially XGBoost) can effectively predict cardiovascular events during exercise evaluation in CHD patients. Cardiovascular events were associated with age, male, diabetes and duration of diabetes, myocardial infarction history, smoking history, hyperlipidemia history, hypertension history, oxygen uptake, and ventilation efficiency indicators.
Collapse
Affiliation(s)
- Tao Shen
- Department of Cardiology, Peking University Third Hospital, National Health Commission Key Laboratory of Cardiovascular Molecular Biology and Regulatory Peptides, Beijing 100191, China
| | - Dan Liu
- Department of Cardiology, Peking University Third Hospital, National Health Commission Key Laboratory of Cardiovascular Molecular Biology and Regulatory Peptides, Beijing 100191, China
| | - Zi Lin
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
| | - Chuan Ren
- Department of Cardiology, Peking University Third Hospital, National Health Commission Key Laboratory of Cardiovascular Molecular Biology and Regulatory Peptides, Beijing 100191, China
| | - Wei Zhao
- Department of Cardiology, Peking University Third Hospital, National Health Commission Key Laboratory of Cardiovascular Molecular Biology and Regulatory Peptides, Beijing 100191, China
- Physical Examination Center, Peking University Third Hospital, Beijing 100191, China
- Correspondence: (W.Z.); (W.G.)
| | - Wei Gao
- Department of Cardiology, Peking University Third Hospital, National Health Commission Key Laboratory of Cardiovascular Molecular Biology and Regulatory Peptides, Beijing 100191, China
- Correspondence: (W.Z.); (W.G.)
| |
Collapse
|
18
|
Milella F, Famiglini L, Banfi G, Cabitza F. Application of Machine Learning to Improve Appropriateness of Treatment in an Orthopaedic Setting of Personalized Medicine. J Pers Med 2022; 12:jpm12101706. [PMID: 36294845 PMCID: PMC9604727 DOI: 10.3390/jpm12101706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 09/26/2022] [Accepted: 10/08/2022] [Indexed: 11/07/2022] Open
Abstract
The rise of personalized medicine and its remarkable advancements have revealed new requirements for the availability of appropriate medical decision-making models. Computer science is an area that plays an essential role in the field of personalized medicine, where one of the goals is to provide algorithms and tools to extrapolate knowledge and improve the decision-support process. The minimum clinically important difference (MCID) is the smallest change in PROM scores that patients perceive as meaningful. Treatment that does not achieve the minimum level of improvement is considered inappropriate as well as a potential waste of resources. Using the MCID threshold to identify patients who fail to achieve the minimum change in PROM that results in a meaningful outcome may aid in pre-surgical shared decision-making. The decision tree algorithm is a method for extracting valuable information and providing further meaningful information to the domain expert that supports the decision-making. In the present study, different tools based on machine learning were developed. On the one hand, we compared three XGBoost models to predict the non-achievement of the MCID at six months post-operation in the SF-12 physical score. The prediction score threshold was set to 0.75 to provide three decision-making areas on the basis of the high confidence (HC) intervals; the minority class was re-balanced by weighting the positive class to penalize the loss function (XGBoost cost-sensitive), oversampling the minority class (XGBoost with SMOTE), and re-sampling the negative class (XGBoost with undersampling). On the other hand, we modeled the data through a decision tree (assessment tree), based on different complexity levels, to identify the hidden pattern and to provide a new way to understand possible relationships between the gathered features and the several outcomes. The results showed that all the proposed models were effective as binary classifiers, as they showed moderate predictive performance both regarding the minority or positive class (i.e., our targeted patients, those who will not benefit from surgery) and the negative class. The decision tree visualization can be exploited during the patient assessment status to better understand if those patients will benefit or not from the medical intervention. Both of these tools can come in handy for increasing knowledge about the patient’s psychophysical state and for creating an increasingly specialized assessment of the individual patient.
Collapse
Affiliation(s)
- Frida Milella
- IRCCS Istituto Ortopedico Galeazzi, Via Cristina Belgioioso 173, 20157 Milano, Italy
- Correspondence:
| | - Lorenzo Famiglini
- DISCo, Dipartimento di Informatica, Sistemistica e Comunicazione, University of Milano–Bicocca, Viale Sarca 336, 20126 Milano, Italy
| | - Giuseppe Banfi
- IRCCS Istituto Ortopedico Galeazzi, Via Cristina Belgioioso 173, 20157 Milano, Italy
- Faculty of Medicine and Surgery, Università Vita-Salute San Raffaele, 20132 Milano, Italy
| | - Federico Cabitza
- IRCCS Istituto Ortopedico Galeazzi, Via Cristina Belgioioso 173, 20157 Milano, Italy
- DISCo, Dipartimento di Informatica, Sistemistica e Comunicazione, University of Milano–Bicocca, Viale Sarca 336, 20126 Milano, Italy
| |
Collapse
|
19
|
Sokhansanj BA, Rosen GL. Predicting COVID-19 disease severity from SARS-CoV-2 spike protein sequence by mixed effects machine learning. Comput Biol Med 2022; 149:105969. [PMID: 36041271 PMCID: PMC9384346 DOI: 10.1016/j.compbiomed.2022.105969] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 07/11/2022] [Accepted: 08/13/2022] [Indexed: 11/17/2022]
Abstract
Epidemiological studies show that COVID-19 variants-of-concern, like Delta and Omicron, pose different risks for severe disease, but they typically lack sequence-level information for the virus. Studies which do obtain viral genome sequences are generally limited in time, location, and population scope. Retrospective meta-analyses require time-consuming data extraction from heterogeneous formats and are limited to publicly available reports. Fortuitously, a subset of GISAID, the global SARS-CoV-2 sequence repository, includes "patient status" metadata that can indicate whether a sequence record is associated with mild or severe disease. While GISAID lacks data on comorbidities relevant to severity, such as obesity and chronic disease, it does include metadata for age and sex to use as additional attributes in modeling. With these caveats, previous efforts have demonstrated that genotype-patient status models can be fit to GISAID data, particularly when country-of-origin is used as an additional feature. But are these models robust and biologically meaningful? This paper shows that, in fact, temporal and geographic biases in sequences submitted to GISAID, as well as the evolving pandemic response, particularly reduction in severe disease due to vaccination, create complex issues for model development and interpretation. This paper poses a potential solution: efficient mixed effects machine learning using GPBoost, treating country as a random effect group. Training and validation using temporally split GISAID data and emerging Omicron variants demonstrates that GPBoost models are more predictive of the impact of spike protein mutations on patient outcomes than fixed effect XGBoost, LightGBM, random forests, and elastic net logistic regression models.
Collapse
Affiliation(s)
- Bahrad A Sokhansanj
- Ecological and Evolutionary Signal Processing & Informatics Laboratory, Drexel University, 3100 Chestnut St., Philadelphia, PA, 19104, United States of America.
| | - Gail L Rosen
- Ecological and Evolutionary Signal Processing & Informatics Laboratory, Drexel University, 3100 Chestnut St., Philadelphia, PA, 19104, United States of America.
| |
Collapse
|
20
|
Cao Z, Li G. MStoCIRC: A powerful tool for downstream analysis of MS/MS data to predict translatable circRNAs. Front Mol Biosci 2022; 9:791797. [PMID: 36072432 PMCID: PMC9441560 DOI: 10.3389/fmolb.2022.791797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Accepted: 07/18/2022] [Indexed: 11/13/2022] Open
Abstract
CircRNAs are formed by a non-canonical splicing method and appear circular in nature. CircRNAs are widely distributed in organisms and have the features of time- and tissue-specific expressions. CircRNAs have attracted increasing interest from scientists because of their non-negligible effects on the growth and development of organisms. The translation capability of circRNAs is a novel and valuable direction in the functional research of circRNAs. To explore the translation potential of circRNAs, some progress has been made in both experimental identification and computational prediction. For computational prediction, both CircCode and CircPro are ribosome profiling-based software applications for predicting translatable circRNAs, and the online databases riboCIRC and TransCirc analyze as many pieces of evidence as possible and list the predicted translatable circRNAs of high confidence. Simultaneously, mass spectrometry in proteomics is often recognized as an efficient method to support the identification of protein and peptide sequences from diverse complex templates. However, few applications fully utilize mass spectrometry to predict translatable circRNAs. Therefore, this research aims to build up a scientific analysis pipeline with two salient features: 1) it starts with the data analysis of raw tandem mass spectrometry data; and 2) it also incorporates other translation evidence such as IRES. The pipeline has been packaged into an analysis tool called mass spectrometry to translatable circRNAs (MStoCIRC). MStoCIRC is mainly implemented by Python3 language programming and could be downloaded from GitHub (https://github.com/QUMU00/mstocirc-master). The tool contains a main program and several small, independent function modules, making it more multifunctional. MStoCIRC can process data efficiently and has obtained hundreds of translatable circRNAs in humans and Arabidopsis thaliana.
Collapse
|
21
|
Ma R, Li S, Li W, Yao L, Huang HD, Lee TY. KinasePhos 3.0: Redesign and Expansion of the Prediction on Kinase-specific Phosphorylation Sites. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022:S1672-0229(22)00081-X. [PMID: 35781048 PMCID: PMC10373160 DOI: 10.1016/j.gpb.2022.06.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2021] [Revised: 05/30/2022] [Accepted: 06/27/2022] [Indexed: 06/04/2023]
Abstract
The purpose of this work is to enhance KinasePhos, a machine learning-based kinase-specific phosphorylation site prediction tool. Experimentally verified kinase-specific phosphorylation data were collected from PhosphoSitePlus, UniProtKB, the Group-based Prediction System 5.0, and Phospho.ELM. In total, 41,421 experimentally verified kinase-specific phosphorylation sites were identified. A total of 1380 unique kinases were identified, including 753 with existing classification information from KinBase and the remaining 627 annotated by building a phylogenetic tree. Based on this kinase classification, a total of 771 predictive models were built at the individual, family, and group levels, using at least 15 experimentally verified substrate sites in positive training datasets. The improved models demonstrated their effectiveness compared with other prediction tools. For example, the prediction of sites phosphorylated by the protein kinase B, casein kinase 2, and protein kinase A families had accuracies of 94.5%, 92.5%, and 90.0%, respectively. The average prediction accuracy for all 771 models was 87.2%. For enhancing interpretability, the SHapley Additive exPlanations (SHAP) method was employed to assess feature importance. The web interface of KinasePhos 3.0 has been redesigned to provide comprehensive annotations of kinase-specific phosphorylation sites on multiple proteins. Additionally, considering the large scale of phosphoproteomic data, a downloadable prediction tool is available at https://awi.cuhk.edu.cn/KinasePhos/download.html or https://github.com/tom-209/KinasePhos-3.0-executable-file.
Collapse
Affiliation(s)
- Renfei Ma
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China; School of Life Sciences, University of Science and Technology of China, Hefei 230027, China
| | - Shangfu Li
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Wenshuo Li
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Lantian Yao
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Hsien-Da Huang
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China; School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China.
| | - Tzong-Yi Lee
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China; School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China.
| |
Collapse
|
22
|
Zhong S, Feng J. CircPrimer 2.0: a software for annotating circRNAs and predicting translation potential of circRNAs. BMC Bioinformatics 2022; 23:215. [PMID: 35668371 PMCID: PMC9169404 DOI: 10.1186/s12859-022-04705-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Accepted: 04/29/2022] [Indexed: 11/27/2022] Open
Abstract
Background Some circular RNAs (circRNAs) can be translated into functional peptides by small open reading frames (ORFs) in a cap-independent manner. Internal ribosomal entry site (IRES) and N6-methyladenosine (m6A) were reported to drive translation of circRNAs. Experimental methods confirming the presence of IRES and m6A site are time consuming and labor intensive. Lacking computational tools to predict ORFs, IRESs and m6A sites for circRNAs makes it harder. Results In this report, we present circPrimer 2.0, a Java based software for annotating circRNAs and predicting ORFs, IRESs, and m6A sites of circRNAs. circPrimer 2.0 has a graphical and a command-line interface that enables the tool to be embed into an analysis pipeline. Conclusions circprimer 2.0 is an easy-to-use software for annotating circRNAs and predicting translation potential of circRNAs, and freely available at www.bio-inf.cn. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04705-y.
Collapse
Affiliation(s)
- Shanliang Zhong
- Center of Clinical Laboratory Science, The Affiliated Cancer Hospital of Nanjing Medical University & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, Nanjing, 210009, China
| | - Jifeng Feng
- Department of Medical Oncology, The Affiliated Cancer Hospital of Nanjing Medical University & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, Baiziting 42, Nanjing, 210009, China.
| |
Collapse
|
23
|
Liu Y, Shen Y, Wang H, Zhang Y, Zhu X. m5Cpred-XS: A New Method for Predicting RNA m5C Sites Based on XGBoost and SHAP. Front Genet 2022; 13:853258. [PMID: 35432446 PMCID: PMC9005994 DOI: 10.3389/fgene.2022.853258] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 02/16/2022] [Indexed: 11/13/2022] Open
Abstract
As one of the most important post-transcriptional modifications of RNA, 5-cytosine-methylation (m5C) is reported to closely relate to many chemical reactions and biological functions in cells. Recently, several computational methods have been proposed for identifying m5C sites. However, the accuracy and efficiency are still not satisfactory. In this study, we proposed a new method, m5Cpred-XS, for predicting m5C sites of H. sapiens, M. musculus, and A. thaliana. First, the powerful SHAP method was used to select the optimal feature subset from seven different kinds of sequence-based features. Second, different machine learning algorithms were used to train the models. The results of five-fold cross-validation indicate that the model based on XGBoost achieved the highest prediction accuracy. Finally, our model was compared with other state-of-the-art models, which indicates that m5Cpred-XS is superior to other methods. Moreover, we deployed the model on a web server that can be accessed through http://m5cpred-xs.zhulab.org.cn/, and m5Cpred-XS is expected to be a useful tool for studying m5C sites.
Collapse
Affiliation(s)
| | | | | | - Yong Zhang
- *Correspondence: Xiaolei Zhu, ; Yong Zhang,
| | | |
Collapse
|
24
|
Clift AK, Hippisley-Cox J, Dodwell D, Lord S, Brady M, Petrou S, Collins GS. Development and validation of clinical prediction models for breast cancer incidence and mortality: a protocol for a dual cohort study. BMJ Open 2022; 12:e050828. [PMID: 35351695 PMCID: PMC8961149 DOI: 10.1136/bmjopen-2021-050828] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 01/07/2022] [Indexed: 11/03/2022] Open
Abstract
INTRODUCTION Breast cancer is the most common cancer and the leading cause of cancer-related death in women worldwide. Risk prediction models may be useful to guide risk-reducing interventions (such as pharmacological agents) in women at increased risk or inform screening strategies for early detection methods such as screening. METHODS AND ANALYSIS The study will use data for women aged 20-90 years between 2000 and 2020 from QResearch linked at the individual level to hospital episodes, cancer registry and death registry data. It will evaluate a set of modelling approaches to predict the risk of developing breast cancer within the next 10 years, the 'combined' risk of developing a breast cancer and then dying from it within 10 years, and the risk of breast cancer mortality within 10 years of diagnosis. Cox proportional hazards, competing risks, random survival forest, deep learning and XGBoost models will be explored. Models will be developed on the entire dataset, with 'apparent' performance reported, and internal-external cross-validation used to assess performance and geographical and temporal transportability (two 10-year time periods). Random effects meta-analysis will pool discrimination and calibration metric estimates from individual geographical units obtained from internal-external cross-validation. We will then externally validate the models in an independent dataset. Evaluation of performance heterogeneity will be conducted throughout, such as exploring performance across ethnic groups. ETHICS AND DISSEMINATION Ethics approval was granted by the QResearch scientific committee (reference number REC 18/EM/0400: OX129). The results will be written up for submission to peer-reviewed journals.
Collapse
Affiliation(s)
- Ashley Kieran Clift
- Cancer Research UK Oxford Centre, University of Oxford, Oxford, UK
- Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK
| | - Julia Hippisley-Cox
- Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK
| | - David Dodwell
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Simon Lord
- Department of Oncology, University of Oxford, Oxford, UK
| | - Mike Brady
- Department of Oncology, University of Oxford, Oxford, UK
| | - Stavros Petrou
- Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK
| | - Gary S Collins
- Centre for Statistics in Medicine, University of Oxford, Oxford, UK
| |
Collapse
|
25
|
Reversal of G-Quadruplexes’ Role in Translation Control When Present in the Context of an IRES. Biomolecules 2022; 12:biom12020314. [PMID: 35204814 PMCID: PMC8869680 DOI: 10.3390/biom12020314] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Revised: 02/08/2022] [Accepted: 02/14/2022] [Indexed: 02/01/2023] Open
Abstract
G-quadruplexes (GQs) are secondary nucleic acid structures that play regulatory roles in various cellular processes. G-quadruplex-forming sequences present within the 5′ UTR of mRNAs can function not only as repressors of translation but also as elements required for optimum function. Based upon previous reports, the majority of the 5′ UTR GQ structures inhibit translation, presumably by blocking the ribosome scanning process that is essential for detection of the initiation codon. However, there are certain mRNAs containing GQs that have been identified as positive regulators of translation, as they are needed for translation initiation. While most cellular mRNAs utilize the 5′ cap structure to undergo cap-dependent translation initiation, many rely on cap-independent translation under certain conditions in which the cap-dependent initiation mechanism is not viable or slowed down, for example, during development, under stress and in many diseases. Cap-independent translation mainly occurs via Internal Ribosomal Entry Sites (IRESs) that are located in the 5′ UTR of mRNAs and are equipped with structural features that can recruit the ribosome or other factors to initiate translation without the need for a 5′ cap. In this review, we will focus only on the role of RNA GQs present in the 5′ UTR of mRNAs, where they play a critical role in translation initiation, and discuss the potential mechanism of this phenomenon, which is yet to be fully delineated.
Collapse
|
26
|
RNA-Binding Proteins as Regulators of Internal Initiation of Viral mRNA Translation. Viruses 2022; 14:v14020188. [PMID: 35215780 PMCID: PMC8879377 DOI: 10.3390/v14020188] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 01/03/2022] [Accepted: 01/14/2022] [Indexed: 12/17/2022] Open
Abstract
Viruses are obligate intracellular parasites that depend on the host’s protein synthesis machinery for translating their mRNAs. The viral mRNA (vRNA) competes with the host mRNA to recruit the translational machinery, including ribosomes, tRNAs, and the limited eukaryotic translation initiation factor (eIFs) pool. Many viruses utilize non-canonical strategies such as targeting host eIFs and RNA elements known as internal ribosome entry sites (IRESs) to reprogram cellular gene expression, ensuring preferential translation of vRNAs. In this review, we discuss vRNA IRES-mediated translation initiation, highlighting the role of RNA-binding proteins (RBPs), other than the canonical translation initiation factors, in regulating their activity.
Collapse
|
27
|
Fukunishi M, Sasai S, Tojo M, Mochizuki T. Novel Fusari- and Toti-like Viruses, with Probable Different Origins, in the Plant Pathogenic Oomycete Globisporangiumultimum. Viruses 2021; 13:1931. [PMID: 34696361 PMCID: PMC8538416 DOI: 10.3390/v13101931] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 09/19/2021] [Accepted: 09/22/2021] [Indexed: 01/01/2023] Open
Abstract
To further classify the oomycete viruses that have been discovered in recent years, we investigated virus infection in the plant-parasitic oomycete Globisporangium ultimum in Japan. Double-stranded RNA detection, high-throughput sequencing, and RT-PCR revealed that the G. ultimum isolate UOP226 contained two viruses related to fusarivirus and totivirus, named Pythium ultimum RNA virus 1 (PuRV1) and Pythium ultimum RNA virus 2 (PuRV2), respectively. Phylogenetic analysis of the deduced amino acid sequence of the RNA-dependent RNA polymerase (RdRp) showed that fusari-like PuRV1 belonged to a different phylogenetic group than Plasmopara viticola lesion-associated fusari virus (PvlaFV) 1-3 from oomycete Plasmopara viticola. Codon usage bias of the PuRV1 RdRp gene was more similar to those of fungi than Globisporangium and Phytophthora, suggesting that the PuRV1 ancestor horizontally transmitted to G. ultimum ancestor from fungi. Phylogenetic analysis of the deduced amino acid sequence of the RdRp of toti-like PuRV2 showed a monophyletic group with the other toti-like oomycete viruses from Globisporangium, Phytophthora, and Pl. viticola. However, the nucleotide sequences of toti-like oomycete viruses were not so homologous, suggesting the possibility of convergent evolution of toti-like oomycete viruses.
Collapse
Affiliation(s)
- Miki Fukunishi
- Graduate School of Life and Environmental Sciences, Osaka Prefecture University, Sakai 599-8531, Japan
| | - Shinsaku Sasai
- Graduate School of Life and Environmental Sciences, Osaka Prefecture University, Sakai 599-8531, Japan
| | - Motoaki Tojo
- Graduate School of Life and Environmental Sciences, Osaka Prefecture University, Sakai 599-8531, Japan
| | - Tomofumi Mochizuki
- Graduate School of Life and Environmental Sciences, Osaka Prefecture University, Sakai 599-8531, Japan
| |
Collapse
|
28
|
Xiao Q, Yin R, Wang Y, Yang S, Ma A, Pan X, Zhu X. Comprehensive Analysis of Peripheral Exosomal circRNAs in Large Artery Atherosclerotic Stroke. Front Cell Dev Biol 2021; 9:685741. [PMID: 34239876 PMCID: PMC8257506 DOI: 10.3389/fcell.2021.685741] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Accepted: 05/21/2021] [Indexed: 12/18/2022] Open
Abstract
Exosomes are crucial vehicles in intercellular communication. Circular RNAs (circRNAs), novel endogenous noncoding RNAs, play diverse roles in ischemic stroke. Recently, the abundance and stability of circRNAs in exosomes have been identified. However, a comprehensive analysis of exosomal circRNAs in large artery atherosclerotic (LAA) stroke has not yet been reported. We performed RNA sequencing (RNA-Seq) to comprehensively identify differentially expressed (DE) exosomal circRNAs in five paired LAA and normal controls. Further, quantitative real-time PCR (qRT-PCR) was used to verify the RNA-Seq results in a cohort of stroke patients (32 versus 32). RNA-Seq identified a total of 462 circRNAs in peripheral exosomes; there were 25 DE circRNAs among them. Additionally, circRNA competing endogenous RNA (ceRNA) network and translatable analysis revealed the potential functions of the exosomal circRNAs in LAA progression. Two ceRNA pathways involving 5 circRNAs, 2 miRNAs, and 3 mRNAs were confirmed by qRT-PCR. In the validation cohort, receiver operating characteristic (ROC) curve analysis identified two circRNAs as possible novel biomarkers, and a logistic model combining two and four circRNAs increased the area under the curve compared with the individual circRNAs. Here, we show for the first time the comprehensive expression of exosomal circRNAs, which displayed the potential diagnostic and biological function in LAA stroke.
Collapse
Affiliation(s)
- Qi Xiao
- Department of Neurology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Ruihua Yin
- Department of Neurology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Yuan Wang
- Department of Neurology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Shaonan Yang
- Department of Neurology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Aijun Ma
- Department of Neurology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Xudong Pan
- Department of Neurology, The Affiliated Hospital of Qingdao University, Qingdao, China.,Institute of Cerebrovascular Diseases, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Xiaoyan Zhu
- Department of Critical Care Medicine, The Affiliated Hospital of Qingdao University, Qingdao, China
| |
Collapse
|
29
|
Yang TH, Wang CY, Tsai HC, Liu CT. Human IRES Atlas: an integrative platform for studying IRES-driven translational regulation in humans. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2021:6263636. [PMID: 33942874 PMCID: PMC8094437 DOI: 10.1093/database/baab025] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 04/16/2021] [Accepted: 04/23/2021] [Indexed: 11/13/2022]
Abstract
It is now known that cap-independent translation initiation facilitated by internal ribosome entry sites (IRESs) is vital in selective cellular protein synthesis under stress and different physiological conditions. However, three problems make it hard to understand transcriptome-wide cellular IRES-mediated translation initiation mechanisms: (i) complex interplay between IRESs and other translation initiation–related information, (ii) reliability issue of in silico cellular IRES investigation and (iii) labor-intensive in vivo IRES identification. In this research, we constructed the Human IRES Atlas database for a comprehensive understanding of cellular IRESs in humans. First, currently available and suitable IRES prediction tools (IRESfinder, PatSearch and IRESpy) were used to obtain transcriptome-wide human IRESs. Then, we collected eight genres of translation initiation–related features to help study the potential molecular mechanisms of each of the putative IRESs. Three functional tests (conservation, structural RNA–protein scores and conditional translation efficiency) were devised to evaluate the functionality of the identified putative IRESs. Moreover, an easy-to-use interface and an IRES–translation initiation interaction map for each gene transcript were implemented to help understand the interactions between IRESs and translation initiation–related features. Researchers can easily search/browse an IRES of interest using the web interface and deduce testable mechanism hypotheses of human IRES-driven translation initiation based on the integrated results. In summary, Human IRES Atlas integrates putative IRES elements and translation initiation–related experiments for better usage of these data and deduction of mechanism hypotheses. Database URL: http://cobishss0.im.nuk.edu.tw/Human_IRES_Atlas/
Collapse
Affiliation(s)
- Tzu-Hsien Yang
- Department of Information Management, National University of Kaohsiung, 700, Kaohsiung University Rd., Nanzih District, Kaohsiung, Taiwan 811, Republic of China
| | - Chung-Yu Wang
- Department of Information Management, National University of Kaohsiung, 700, Kaohsiung University Rd., Nanzih District, Kaohsiung, Taiwan 811, Republic of China
| | - Hsiu-Chun Tsai
- Department of Information Management, National University of Kaohsiung, 700, Kaohsiung University Rd., Nanzih District, Kaohsiung, Taiwan 811, Republic of China
| | - Cheng-Tse Liu
- Department of Information Management, National University of Kaohsiung, 700, Kaohsiung University Rd., Nanzih District, Kaohsiung, Taiwan 811, Republic of China
| |
Collapse
|
30
|
Machine learning applied to serum and cerebrospinal fluid metabolomes revealed altered arginine metabolism in neonatal sepsis with meningoencephalitis. Comput Struct Biotechnol J 2021; 19:3284-3292. [PMID: 34188777 PMCID: PMC8207169 DOI: 10.1016/j.csbj.2021.05.024] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 05/02/2021] [Accepted: 05/10/2021] [Indexed: 12/15/2022] Open
Abstract
Background Neonatal sepsis with meningoencephalitis is a common complication of sepsis, which is a leading cause of neonatal death and neurological dysfunction. Early identification of neonatal sepsis with meningoencephalitis is particularly important for reducing brain damage. We recruited 70 patients with neonatal sepsis, 42 of which were diagnosed as meningoencephalitis, and collected cerebrospinal fluid (CSF) and serum samples. The purpose of this study was to find neonatal sepsis with meningoencephalitis-related markers using unbiased metabolomics technology and artificial intelligence analysis based on machine learning methods. Results We found that the characteristics of neonatal sepsis with meningoencephalitis were manifested mainly as significant decreases in the concentrations of homo-l-arginine, creatinine, and other arginine metabolites in serum and CSF, suggesting possible changes in nitric oxide synthesis. The antioxidants taurine and proline in the serum of the neonatal sepsis with meningoencephalitis increased significantly, suggesting abnormal oxidative stress. Potentially harmful bile salts and aromatic compounds were significantly increased in the serum of the group with meningoencephalitis. We compared different machine learning methods and found that the lasso algorithm performed best. Combining the lasso and XGBoost algorithms was successful in predicting the concentration of homo-l-arginine in CSF per the concentrations of metabolite markers in the serum. Conclusions On the basis of machine learning combined with analysis of the serum and CSF metabolomes, we found metabolite markers related to neonatal sepsis with meningoencephalitis. The characteristics of neonatal sepsis with meningoencephalitis were manifested mainly by changes in arginine metabolism and related changes in creatinine metabolism.
Collapse
|
31
|
Development of machine learning model for diagnostic disease prediction based on laboratory tests. Sci Rep 2021; 11:7567. [PMID: 33828178 PMCID: PMC8026627 DOI: 10.1038/s41598-021-87171-5] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 03/19/2021] [Indexed: 01/16/2023] Open
Abstract
The use of deep learning and machine learning (ML) in medical science is increasing, particularly in the visual, audio, and language data fields. We aimed to build a new optimized ensemble model by blending a DNN (deep neural network) model with two ML models for disease prediction using laboratory test results. 86 attributes (laboratory tests) were selected from datasets based on value counts, clinical importance-related features, and missing values. We collected sample datasets on 5145 cases, including 326,686 laboratory test results. We investigated a total of 39 specific diseases based on the International Classification of Diseases, 10th revision (ICD-10) codes. These datasets were used to construct light gradient boosting machine (LightGBM) and extreme gradient boosting (XGBoost) ML models and a DNN model using TensorFlow. The optimized ensemble model achieved an F1-score of 81% and prediction accuracy of 92% for the five most common diseases. The deep learning and ML models showed differences in predictive power and disease classification patterns. We used a confusion matrix and analyzed feature importance using the SHAP value method. Our new ML model achieved high efficiency of disease prediction through classification of diseases. This study will be useful in the prediction and diagnosis of diseases.
Collapse
|
32
|
Targeting the DEAD-Box RNA Helicase eIF4A with Rocaglates-A Pan-Antiviral Strategy for Minimizing the Impact of Future RNA Virus Pandemics. Microorganisms 2021; 9:microorganisms9030540. [PMID: 33807988 PMCID: PMC8001013 DOI: 10.3390/microorganisms9030540] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 03/01/2021] [Accepted: 03/02/2021] [Indexed: 12/17/2022] Open
Abstract
The increase in pandemics caused by RNA viruses of zoonotic origin highlights the urgent need for broad-spectrum antivirals against novel and re-emerging RNA viruses. Broad-spectrum antivirals could be deployed as first-line interventions during an outbreak while virus-specific drugs and vaccines are developed and rolled out. Viruses depend on the host’s protein synthesis machinery for replication. Several natural compounds that target the cellular DEAD-box RNA helicase eIF4A, a key component of the eukaryotic translation initiation complex eIF4F, have emerged as potential broad-spectrum antivirals. Rocaglates, a group of flavaglines of plant origin that clamp mRNAs with highly structured 5′ untranslated regions (5′UTRs) onto the surface of eIF4A through specific stacking interactions, exhibit the largest selectivity and potential therapeutic indices among all known eIF4A inhibitors. Their unique mechanism of action limits the inhibitory effect of rocaglates to the translation of eIF4A-dependent viral mRNAs and a minor fraction of host mRNAs exhibiting stable RNA secondary structures and/or polypurine sequence stretches in their 5′UTRs, resulting in minimal potential toxic side effects. Maintaining a favorable safety profile while inducing efficient inhibition of a broad spectrum of RNA viruses makes rocaglates into primary candidates for further development as pan-antiviral therapeutics.
Collapse
|
33
|
Using machine learning tools to predict outcomes for emergency department intensive care unit patients. Sci Rep 2020; 10:20919. [PMID: 33262471 PMCID: PMC7708467 DOI: 10.1038/s41598-020-77548-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Accepted: 11/04/2020] [Indexed: 12/23/2022] Open
Abstract
The number of critically ill patients has increased globally along with the rise in emergency visits. Mortality prediction for critical patients is vital for emergency care, which affects the distribution of emergency resources. Traditional scoring systems are designed for all emergency patients using a classic mathematical method, but risk factors in critically ill patients have complex interactions, so traditional scoring cannot as readily apply to them. As an accurate model for predicting the mortality of emergency department critically ill patients is lacking, this study's objective was to develop a scoring system using machine learning optimized for the unique case of critical patients in emergency departments. We conducted a retrospective cohort study in a tertiary medical center in Beijing, China. Patients over 16 years old were included if they were alive when they entered the emergency department intensive care unit system from February 2015 and December 2015. Mortality up to 7 days after admission into the emergency department was considered as the primary outcome, and 1624 cases were included to derive the models. Prospective factors included previous diseases, physiologic parameters, and laboratory results. Several machine learning tools were built for 7-day mortality using these factors, for which their predictive accuracy (sensitivity and specificity) was evaluated by area under the curve (AUC). The AUCs were 0.794, 0.840, 0.849 and 0.822 respectively, for the SVM, GBDT, XGBoost and logistic regression model. In comparison with the SAPS 3 model (AUC = 0.826), the discriminatory capability of the newer machine learning methods, XGBoost in particular, is demonstrated to be more reliable for predicting outcomes for emergency department intensive care unit patients.
Collapse
|
34
|
Identification of Two Novel Circular RNAs Deriving from BCL2L12 and Investigation of Their Potential Value as a Molecular Signature in Colorectal Cancer. Int J Mol Sci 2020; 21:ijms21228867. [PMID: 33238574 PMCID: PMC7709015 DOI: 10.3390/ijms21228867] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Revised: 11/06/2020] [Accepted: 11/15/2020] [Indexed: 02/07/2023] Open
Abstract
The utility of circular RNAs (circRNAs) as molecular biomarkers has recently emerged. However, only a handful of them have already been studied in colorectal cancer (CRC). The purpose of this study was to identify new circRNAs deriving from BCL2L12, a member of the BCL2 apoptosis-related family, and investigate their potential as biomarkers in CRC. Total RNA extracts from CRC cell lines and tissue samples were reversely transcribed. By combining PCR with divergent primers and nested PCR followed by Sanger sequencing, we were able to discover two BCL2L12 circRNAs. Subsequently, bioinformatical tools were used to predict the interactions of these circRNAs with microRNAs (miRNAs) and RNA-binding proteins (RBPs). Following a PCR-based pre-amplification, real-time qPCR was carried out for the quantification of each circRNA in CRC samples and cell lines. Biostatistical analysis was used to assess their potential prognostic value in CRC. Both novel BCL2L12 circRNAs likely interact with particular miRNAs and RBPs. Interestingly, circ-BCL2L12-2 expression is inversely associated with TNM stage, while circ-BCL2L12-1 overexpression is associated with shorter overall survival in CRC, particularly among TNM stage II patients. Overall, we identified two novel BCL2L12 circRNAs, one of which can further stratify TNM stage II patients into two subgroups with substantially distinct prognosis.
Collapse
|
35
|
Bi Y, Xiang D, Ge Z, Li F, Jia C, Song J. An Interpretable Prediction Model for Identifying N 7-Methylguanosine Sites Based on XGBoost and SHAP. MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 22:362-372. [PMID: 33230441 PMCID: PMC7533297 DOI: 10.1016/j.omtn.2020.08.022] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 08/20/2020] [Indexed: 12/19/2022]
Abstract
Recent studies have increasingly shown that the chemical modification of mRNA plays an important role in the regulation of gene expression. N7-methylguanosine (m7G) is a type of positively-charged mRNA modification that plays an essential role for efficient gene expression and cell viability. However, the research on m7G has received little attention to date. Bioinformatics tools can be applied as auxiliary methods to identify m7G sites in transcriptomes. In this study, we develop a novel interpretable machine learning-based approach termed XG-m7G for the differentiation of m7G sites using the XGBoost algorithm and six different types of sequence-encoding schemes. Both 10-fold and jackknife cross-validation tests indicate that XG-m7G outperforms iRNA-m7G. Moreover, using the powerful SHAP algorithm, this new framework also provides desirable interpretations of the model performance and highlights the most important features for identifying m7G sites. XG-m7G is anticipated to serve as a useful tool and guide for researchers in their future studies of mRNA modification sites.
Collapse
Affiliation(s)
- Yue Bi
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Dongxu Xiang
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Zongyuan Ge
- Monash e-Research Centre and Faculty of Engineering, Monash University, Melbourne, VIC 3800, Australia
| | - Fuyi Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
36
|
Shi Y, Jia X, Xu J. The new function of circRNA: translation. Clin Transl Oncol 2020; 22:2162-2169. [PMID: 32449127 DOI: 10.1007/s12094-020-02371-1] [Citation(s) in RCA: 107] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2019] [Accepted: 05/01/2020] [Indexed: 12/17/2022]
Abstract
Circular RNAs (circRNAs) have been considered a special class of non-coding RNAs without 5' caps and 3' tails which are covalently closed RNA molecules generated by back splicing of mRNA. For a long time, circRNAs have been considered to be directly involved in various biological processes as functional RNA. In recent years, a variety of circRNAs have been found to have translational functions, and the resultant peptides also play biological roles in the emergence and progression of human disease. The discovery of these circRNAs and their encoded peptides has enriched genomics, helped us to study the causes of diseases, and promoted the development of biotechnology. The purpose of this review is to summarize the research progress of the detection methods, translation initiation mechanism, as well as functional mechanism of peptides encoded by circRNAs, with the goal of providing the directions for the discovery of biomarkers for diagnosis, prognosis, and therapeutic targets for human disease.
Collapse
Affiliation(s)
- Y Shi
- Department of Gynecology, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, No.123, Tianfei Xiang, Mochou Road, Nanjing, 210004, China
| | - X Jia
- Department of Gynecology, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, No.123, Tianfei Xiang, Mochou Road, Nanjing, 210004, China.
| | - J Xu
- Department of Gynecology, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, No.123, Tianfei Xiang, Mochou Road, Nanjing, 210004, China.
| |
Collapse
|
37
|
Estimating the Growing Stem Volume of Chinese Pine and Larch Plantations based on Fused Optical Data Using an Improved Variable Screening Method and Stacking Algorithm. REMOTE SENSING 2020. [DOI: 10.3390/rs12050871] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Accurately estimating growing stem volume (GSV) is very important for forest resource management. The GSV estimation is affected by remote sensing images, variable selection methods, and estimation algorithms. Optical images have been widely used for modeling key attributes of forest stands, including GSV and aboveground biomass (AGB), because of their easy availability, large coverage and related mature data processing and analysis technologies. However, the low data saturation level and the difficulty of selecting feature variables from optical images often impede the improvement of estimation accuracy. In this research, two GaoFen-2 (GF-2) images, a Landsat 8 image, and fused images created by integrating GF-2 bands with the Landsat multispectral image using the Gram–Schmidt method were first used to derive various feature variables and obtain various datasets or data scenarios. A DC-FSCK approach that integrates feature variable screening and a combination optimization procedure based on the distance correlation coefficient and k-nearest neighbors (kNN) algorithm was proposed and compared with the stepwise regression analysis (SRA) and random forest (RF) for feature variable selection. The DC-FSCK considers the self-correlation and combination effect among feature variables so that the selected variables can improve the accuracy and saturation level of GSV estimation. To validate the proposed approach, six estimation algorithms were examined and compared, including Multiple Linear Regression (MLR), kNN, Support Vector Regression (SVR), RF, eXtreme Gradient Boosting (XGBoost) and Stacking. The results showed that compared with GF-2 and Landsat 8 images, overall, the fused image (Red_Landsat) of GF-2 red band with Landsat 8 multispectral image improved the GSV estimation accuracy of Chinese pine and larch plantations. The Red_Landsat image also performed better than other fused images (Pan_Landsat, Blue_Landsat, Green_Landsat and Nir_Landsat). For most of the combinations of the datasets and estimation models, the proposed variable selection method DC-FSCK led to more accurate GSV estimates compared with SRA and RF. In addition, in most of the combinations obtained by the datasets and variable selection methods, the Stacking algorithm performed better than other estimation models. More importantly, the combination of the fused image Red_Landsat with the DC-FSCK and Stacking algorithm led to the best performance of GSV estimation with the greatest adjusted coefficients of determination, 0.8127 and 0.6047, and the smallest relative root mean square errors of 17.1% and 20.7% for Chinese pine and larch, respectively. This study provided new insights on how to choose suitable optical images, variable selection methods and optimal modeling algorithms for the GSV estimation of Chinese pine and larch plantations.
Collapse
|
38
|
Computational Identification and Analysis of Ubiquinone-Binding Proteins. Cells 2020; 9:cells9020520. [PMID: 32102444 PMCID: PMC7072731 DOI: 10.3390/cells9020520] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 02/21/2020] [Accepted: 02/21/2020] [Indexed: 12/15/2022] Open
Abstract
Ubiquinone is an important cofactor that plays vital and diverse roles in many biological processes. Ubiquinone-binding proteins (UBPs) are receptor proteins that dock with ubiquinones. Analyzing and identifying UBPs via a computational approach will provide insights into the pathways associated with ubiquinones. In this work, we were the first to propose a UBPs predictor (UBPs-Pred). The optimal feature subset selected from three categories of sequence-derived features was fed into the extreme gradient boosting (XGBoost) classifier, and the parameters of XGBoost were tuned by multi-objective particle swarm optimization (MOPSO). The experimental results over the independent validation demonstrated considerable prediction performance with a Matthews correlation coefficient (MCC) of 0.517. After that, we analyzed the UBPs using bioinformatics methods, including the statistics of the binding domain motifs and protein distribution, as well as an enrichment analysis of the gene ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway.
Collapse
|