1
|
Paulo P, Cardoso M, Brandão A, Pinto P, Falconi A, Pinheiro M, Cerveira N, Silva R, Santos C, Pinto C, Peixoto A, Maia S, Teixeira MR. Genetic landscape of homologous recombination repair genes in early-onset/familial prostate cancer patients. Genes Chromosomes Cancer 2023; 62:710-720. [PMID: 37436117 DOI: 10.1002/gcc.23190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 06/19/2023] [Accepted: 06/28/2023] [Indexed: 07/13/2023] Open
Abstract
Prostate cancer (PrCa) is one of the three most frequent and deadliest cancers worldwide. The discovery of PARP inhibitors for the treatment of tumors with deleterious variants in homologous recombination repair (HRR) genes has placed PrCa on the roadmap of precision medicine. However, the overall contribution of HRR genes to the 10%-20% of carcinomas arising in men with early-onset/familial PrCa has not been fully clarified. We used targeted next-generation sequencing (T-NGS) covering eight HRR genes (ATM, BRCA1, BRCA2, BRIP1, CHEK2, NBN, PALB2, and RAD51C) and an analysis pipeline querying both small and large genomic variations to clarify their global and relative contribution to hereditary PrCa predisposition in a series of 462 early-onset/familial PrCa cases. Deleterious variants were found in 3.9% of the patients, with CHEK2 and ATM being the most frequently mutated genes (38.9% and 22.2% of the carriers, respectively), followed by PALB2 and NBN (11.1% of the carriers, each), and finally by BRCA2, RAD51C, and BRIP1 (5.6% of the carriers, each). Using the same NGS data, exonic rearrangements were found in two patients, one pathogenic in BRCA2 and one of unknown significance in BRCA1. These results contribute to clarify the genetic heterogeneity that underlies PrCa predisposition in the early-onset and familial disease, respectively.
Collapse
Affiliation(s)
- Paula Paulo
- Cancer Genetics Group, IPO Porto Research Center (CI-IPOP) /RISE@CI-IPOP (Health Research Network), Portuguese Oncology Institute of Porto (IPO Porto) /Porto Comprehensive Cancer Center, Porto, Portugal
| | - Marta Cardoso
- Cancer Genetics Group, IPO Porto Research Center (CI-IPOP) /RISE@CI-IPOP (Health Research Network), Portuguese Oncology Institute of Porto (IPO Porto) /Porto Comprehensive Cancer Center, Porto, Portugal
| | - Andreia Brandão
- Cancer Genetics Group, IPO Porto Research Center (CI-IPOP) /RISE@CI-IPOP (Health Research Network), Portuguese Oncology Institute of Porto (IPO Porto) /Porto Comprehensive Cancer Center, Porto, Portugal
| | - Pedro Pinto
- Cancer Genetics Group, IPO Porto Research Center (CI-IPOP) /RISE@CI-IPOP (Health Research Network), Portuguese Oncology Institute of Porto (IPO Porto) /Porto Comprehensive Cancer Center, Porto, Portugal
| | - Ariane Falconi
- Cancer Genetics Group, IPO Porto Research Center (CI-IPOP) /RISE@CI-IPOP (Health Research Network), Portuguese Oncology Institute of Porto (IPO Porto) /Porto Comprehensive Cancer Center, Porto, Portugal
| | - Manuela Pinheiro
- Cancer Genetics Group, IPO Porto Research Center (CI-IPOP) /RISE@CI-IPOP (Health Research Network), Portuguese Oncology Institute of Porto (IPO Porto) /Porto Comprehensive Cancer Center, Porto, Portugal
| | - Nuno Cerveira
- Cancer Genetics Group, IPO Porto Research Center (CI-IPOP) /RISE@CI-IPOP (Health Research Network), Portuguese Oncology Institute of Porto (IPO Porto) /Porto Comprehensive Cancer Center, Porto, Portugal
- Department of Laboratory Genetics, Portuguese Oncology Institute of Porto (IPO Porto)/Porto Comprehensive Cancer Center, Porto, Portugal
| | - Rui Silva
- Cancer Genetics Group, IPO Porto Research Center (CI-IPOP) /RISE@CI-IPOP (Health Research Network), Portuguese Oncology Institute of Porto (IPO Porto) /Porto Comprehensive Cancer Center, Porto, Portugal
| | - Catarina Santos
- Cancer Genetics Group, IPO Porto Research Center (CI-IPOP) /RISE@CI-IPOP (Health Research Network), Portuguese Oncology Institute of Porto (IPO Porto) /Porto Comprehensive Cancer Center, Porto, Portugal
- Department of Laboratory Genetics, Portuguese Oncology Institute of Porto (IPO Porto)/Porto Comprehensive Cancer Center, Porto, Portugal
| | - Carla Pinto
- Cancer Genetics Group, IPO Porto Research Center (CI-IPOP) /RISE@CI-IPOP (Health Research Network), Portuguese Oncology Institute of Porto (IPO Porto) /Porto Comprehensive Cancer Center, Porto, Portugal
- Department of Laboratory Genetics, Portuguese Oncology Institute of Porto (IPO Porto)/Porto Comprehensive Cancer Center, Porto, Portugal
| | - Ana Peixoto
- Cancer Genetics Group, IPO Porto Research Center (CI-IPOP) /RISE@CI-IPOP (Health Research Network), Portuguese Oncology Institute of Porto (IPO Porto) /Porto Comprehensive Cancer Center, Porto, Portugal
- Department of Laboratory Genetics, Portuguese Oncology Institute of Porto (IPO Porto)/Porto Comprehensive Cancer Center, Porto, Portugal
| | - Sofia Maia
- Cancer Genetics Group, IPO Porto Research Center (CI-IPOP) /RISE@CI-IPOP (Health Research Network), Portuguese Oncology Institute of Porto (IPO Porto) /Porto Comprehensive Cancer Center, Porto, Portugal
- Medical Genetics Unit, Hospital Pediátrico de Coimbra, Centro Hospitalar e Universitário de Coimbra, Coimbra, Portugal
| | - Manuel R Teixeira
- Cancer Genetics Group, IPO Porto Research Center (CI-IPOP) /RISE@CI-IPOP (Health Research Network), Portuguese Oncology Institute of Porto (IPO Porto) /Porto Comprehensive Cancer Center, Porto, Portugal
- Department of Laboratory Genetics, Portuguese Oncology Institute of Porto (IPO Porto)/Porto Comprehensive Cancer Center, Porto, Portugal
- School of Medicine and Biomedical Sciences (ICBAS), University of Porto, Porto, Portugal
| |
Collapse
|
2
|
Peixoto J, Príncipe C, Pestana A, Osório H, Pinto MT, Prazeres H, Soares P, Lima RT. Using a Dual CRISPR/Cas9 Approach to Gain Insight into the Role of LRP1B in Glioblastoma. Int J Mol Sci 2023; 24:11285. [PMID: 37511044 PMCID: PMC10379115 DOI: 10.3390/ijms241411285] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Revised: 06/27/2023] [Accepted: 07/04/2023] [Indexed: 07/30/2023] Open
Abstract
LRP1B remains one of the most altered genes in cancer, although its relevance in cancer biology is still unclear. Recent advances in gene editing techniques, particularly CRISPR/Cas9 systems, offer new opportunities to evaluate the function of large genes, such as LRP1B. Using a dual sgRNA CRISPR/Cas9 gene editing approach, this study aimed to assess the impact of disrupting LRP1B in glioblastoma cell biology. Four sgRNAs were designed for the dual targeting of two LRP1B exons (1 and 85). The U87 glioblastoma (GB) cell line was transfected with CRISPR/Cas9 PX459 vectors. To assess LRP1B-gene-induced alterations and expression, PCR, Sanger DNA sequencing, and qRT-PCR were carried out. Three clones (clones B9, E6, and H7) were further evaluated. All clones presented altered cellular morphology, increased cellular and nuclear size, and changes in ploidy. Two clones (E6 and H7) showed a significant decrease in cell growth, both in vitro and in the in vivo CAM assay. Proteomic analysis of the clones' secretome identified differentially expressed proteins that had not been previously associated with LRP1B alterations. This study demonstrates that the dual sgRNA CRISPR/Cas9 strategy can effectively edit LRP1B in GB cells, providing new insights into the impact of LRP1B deletions in GBM biology.
Collapse
Grants
- PTDC/MEC-ONC/31520/2017 FEEI, FEDER through COMPETE 2020 -POCI, Portugal 2020, and by Portuguese funds through FCT/Ministério da Ciência, Tecnologia e Ensino Superior
- POCI-01-0145-FEDER-028779 (PTDC/BIA-MIC/28779/2017) FEEI, FEDER through COMPETE 2020 -POCI, Portugal 2020, and by Portuguese funds through FCT/Ministério da Ciência, Tecnologia e Ensino Superior
- project "Institute for Research and Innovation in Health Sciences" (UID/BIM/04293/2019) FEEI, FEDER through COMPETE 2020 -POCI, Portugal 2020, and by Portuguese funds through FCT/Ministério da Ciência, Tecnologia e Ensino Superior
- "Cancer Research on Therapy Resistance: From Basic Mechanisms to Novel Targets"-NORTE-01-0145-FEDER-000051 Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF
- The Porto Comprehensive Cancer Center" with the reference NORTE-01-0145-FEDER-072678 - Consórcio PORTO.CCC - Porto.Comprehensive Cancer Center Raquel Seruca European Regional Development Fund
- ROTEIRO/0028/2013; LISBOA-01-0145-FEDER-022125 Portuguese Mass Spectrometry Network, integrated in the National Roadmap of Research Infra-structures of Strategic Relevance
Collapse
Affiliation(s)
- Joana Peixoto
- i3S-Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135 Porto, Portugal
- Cancer Signaling and Metabolism Group, IPATIMUP-Institute of Molecular Pathology and Immunology of the University of Porto, Rua Alfredo Allen 208, 4169-007 Porto, Portugal
| | - Catarina Príncipe
- i3S-Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135 Porto, Portugal
- Cancer Signaling and Metabolism Group, IPATIMUP-Institute of Molecular Pathology and Immunology of the University of Porto, Rua Alfredo Allen 208, 4169-007 Porto, Portugal
- Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
| | - Ana Pestana
- i3S-Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135 Porto, Portugal
- Cancer Signaling and Metabolism Group, IPATIMUP-Institute of Molecular Pathology and Immunology of the University of Porto, Rua Alfredo Allen 208, 4169-007 Porto, Portugal
| | - Hugo Osório
- i3S-Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135 Porto, Portugal
- IPATIMUP-Institute of Molecular Pathology and Immunology of the University of Porto, 4200-135 Porto, Portugal
- FMUP-Department of Pathology, Faculty of Medicine, University of Porto, Alameda Prof. Hernâni Monteiro, 4200-319 Porto, Portugal
| | - Marta Teixeira Pinto
- i3S-Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135 Porto, Portugal
- IPATIMUP-Institute of Molecular Pathology and Immunology of the University of Porto, 4200-135 Porto, Portugal
| | - Hugo Prazeres
- i3S-Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135 Porto, Portugal
- IPATIMUP-Institute of Molecular Pathology and Immunology of the University of Porto, 4200-135 Porto, Portugal
| | - Paula Soares
- i3S-Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135 Porto, Portugal
- Cancer Signaling and Metabolism Group, IPATIMUP-Institute of Molecular Pathology and Immunology of the University of Porto, Rua Alfredo Allen 208, 4169-007 Porto, Portugal
- IPATIMUP-Institute of Molecular Pathology and Immunology of the University of Porto, 4200-135 Porto, Portugal
- FMUP-Department of Pathology, Faculty of Medicine, University of Porto, Alameda Prof. Hernâni Monteiro, 4200-319 Porto, Portugal
| | - Raquel T Lima
- i3S-Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135 Porto, Portugal
- Cancer Signaling and Metabolism Group, IPATIMUP-Institute of Molecular Pathology and Immunology of the University of Porto, Rua Alfredo Allen 208, 4169-007 Porto, Portugal
- IPATIMUP-Institute of Molecular Pathology and Immunology of the University of Porto, 4200-135 Porto, Portugal
- FMUP-Department of Pathology, Faculty of Medicine, University of Porto, Alameda Prof. Hernâni Monteiro, 4200-319 Porto, Portugal
| |
Collapse
|
3
|
Chadha A, Dara R, Pearl DL, Sharif S, Poljak Z. Predictive analysis for pathogenicity classification of H5Nx avian influenza strains using machine learning techniques. Prev Vet Med 2023; 216:105924. [PMID: 37224663 DOI: 10.1016/j.prevetmed.2023.105924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 03/17/2023] [Accepted: 04/21/2023] [Indexed: 05/26/2023]
Abstract
Over the past decades, avian influenza (AI) outbreaks have been reported across different parts of the globe, resulting in large-scale economic and livestock loss and, in some cases raising concerns about their zoonotic potential. The virulence and pathogenicity of H5Nx (e.g., H5N1, H5N2) AI strains for poultry could be inferred through various approaches, and it has been frequently performed by detecting certain pathogenicity markers in their haemagglutinin (HA) gene. The utilization of predictive modeling methods represents a possible approach to exploring this genotypic-phenotypic relationship for assisting experts in determining the pathogenicity of circulating AI viruses. Therefore, the main objective of this study was to evaluate the predictive performance of different machine learning (ML) techniques for in-silico prediction of pathogenicity of H5Nx viruses in poultry, using complete genetic sequences of the HA gene. We annotated 2137 H5Nx HA gene sequences based on the presence of the polybasic HA cleavage site (HACS) with 46.33% and 53.67% of sequences previously identified as highly pathogenic (HP) and low pathogenic (LP), respectively. We compared the performance of different ML classifiers (e.g., logistic regression (LR) with the lasso and ridge regularization, random forest (RF), K-nearest neighbor (KNN), Naïve Bayes (NB), support vector machine (SVM), and convolutional neural network (CNN)) for pathogenicity classification of raw H5Nx nucleotide and protein sequences using a 10-fold cross-validation technique. We found that different ML techniques can be successfully used for the pathogenicity classification of H5 sequences with ∼99% classification accuracy. Our results indicate that for pathogenicity classification of (1) aligned deoxyribonucleic acid (DNA) and protein sequences, with NB classifier had the lowest accuracies of 98.41% (+/-0.89) and 98.31% (+/-1.06), respectively; (2) aligned DNA and protein sequences, with LR (L1/L2), KNN, SVM (radial basis function (RBF)) and CNN classifiers had the highest accuracies of 99.20% (+/-0.54) and 99.20% (+/-0.38), respectively; (3) unaligned DNA and protein sequences, with CNN's achieved accuracies of 98.54% (+/-0.68) and 99.20% (+/-0.50), respectively. ML methods show potential for regular classification of H5Nx virus pathogenicity for poultry species, particularly when sequences containing regular markers were frequently present in the training dataset.
Collapse
Affiliation(s)
- Akshay Chadha
- School of Computer Science, University of Guelph, Guelph, Ontario N1G 2W1, Canada.
| | - Rozita Dara
- School of Computer Science, University of Guelph, Guelph, Ontario N1G 2W1, Canada
| | - David L Pearl
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, Ontario N1G 2W1, Canada
| | - Shayan Sharif
- Department of Pathobiology, University of Guelph, Guelph, Ontario N1G 2W1, Canada
| | - Zvonimir Poljak
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, Ontario N1G 2W1, Canada
| |
Collapse
|
4
|
Leclerc J, Beaumont M, Vibert R, Pinson S, Vermaut C, Flament C, Lovecchio T, Delattre L, Demay C, Coulet F, Guillerm E, Hamzaoui N, Benusiglio PR, Brahimi A, Cornelis F, Delhomelle H, Fert-Ferrer S, Fournier BPJ, Hovnanian A, Legrand C, Lortholary A, Malka D, Petit F, Saurin JC, Lejeune S, Colas C, Buisine MP. AXIN2 germline testing in a French cohort validates pathogenic variants as a rare cause of predisposition to colorectal polyposis and cancer. Genes Chromosomes Cancer 2023; 62:210-222. [PMID: 36502525 PMCID: PMC10107344 DOI: 10.1002/gcc.23112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 11/23/2022] [Accepted: 11/29/2022] [Indexed: 12/14/2022] Open
Abstract
Only a few patients with germline AXIN2 variants and colorectal adenomatous polyposis or cancer have been described, raising questions about the actual contribution of this gene to colorectal cancer (CRC) susceptibility. To assess the clinical relevance for AXIN2 testing in patients suspected of genetic predisposition to CRC, we collected clinical and molecular data from the French Oncogenetics laboratories analyzing AXIN2 in this context. Between 2004 and June 2020, 10 different pathogenic/likely pathogenic AXIN2 variants were identified in 11 unrelated individuals. Eight variants were from a consecutive series of 3322 patients, which represents a frequency of 0.24%. However, loss-of-function AXIN2 variants were strongly associated with genetic predisposition to CRC as compared with controls (odds ratio: 11.89, 95% confidence interval: 5.103-28.93). Most of the variants were predicted to produce an AXIN2 protein devoid of the SMAD3-binding and DIX domains, but preserving the β-catenin-binding domain. Ninety-one percent of the AXIN2 variant carriers who underwent colonoscopy had adenomatous polyposis. Forty percent of the variant carriers developed colorectal or/and other digestive cancer. Multiple tooth agenesis was present in at least 60% of them. Our report provides further evidence for a role of AXIN2 in CRC susceptibility, arguing for AXIN2 testing in patients with colorectal adenomatous polyposis or cancer.
Collapse
Affiliation(s)
- Julie Leclerc
- Univ. Lille, CNRS, Inserm, CHU Lille, UMR9020-U1277 CANTHER - Cancer Heterogeneity Plasticity and Resistance to Therapies, Lille, France.,Molecular Oncogenetics, Department of Biochemistry and Molecular Biology, Lille University Hospital, Lille, France
| | - Marie Beaumont
- Laboratoire de Génétique Moléculaire et Génomique, CHU Rennes, Rennes, France
| | - Roseline Vibert
- UF d'Oncogénétique Clinique, Département de Génétique et Institut Universitaire de Cancérologie, Hôpitaux Pitié-Salpêtrière et Saint-Antoine, AP-HP. Sorbonne Université, Paris, France
| | - Stéphane Pinson
- Human Genetics Department, Hospices Civils de Lyon, Lyon, France
| | - Catherine Vermaut
- Molecular Oncogenetics, Department of Biochemistry and Molecular Biology, Lille University Hospital, Lille, France
| | - Cathy Flament
- Molecular Oncogenetics, Department of Biochemistry and Molecular Biology, Lille University Hospital, Lille, France
| | - Tonio Lovecchio
- Molecular Oncogenetics, Department of Biochemistry and Molecular Biology, Lille University Hospital, Lille, France
| | - Lucie Delattre
- Molecular Oncogenetics, Department of Biochemistry and Molecular Biology, Lille University Hospital, Lille, France
| | - Christophe Demay
- Bioinformatics Unit, Molecular Biology Facility, Lille University Hospital, Lille, France
| | - Florence Coulet
- Sorbonne University, INSERM, Saint-Antoine Research Center, Microsatellites instability and Cancer, CRSA, Genetics Department, AP-HP, Hôpital Pitié Salpêtrière, Sorbonne University, Paris, France
| | - Erell Guillerm
- Sorbonne University, INSERM, Saint-Antoine Research Center, Microsatellites instability and Cancer, CRSA, Genetics Department, AP-HP, Hôpital Pitié Salpêtrière, Sorbonne University, Paris, France
| | - Nadim Hamzaoui
- Service de Génétique et Biologie Moléculaires, Hôpital Cochin, AP-HP Centre, Université de Paris, and INSERM UMR_S1016, Institut Cochin, Université de Paris, Paris, France
| | - Patrick R Benusiglio
- UF d'Oncogénétique Clinique, Département de Génétique et Institut Universitaire de Cancérologie, Hôpitaux Pitié-Salpêtrière et Saint-Antoine, AP-HP. Sorbonne Université, Paris, France
| | | | - François Cornelis
- Department of Genetics-Oncogénétics-Prevention, Clermont-Ferrand Hospital, Clermont-Auvergne University, Clermont Ferrand, France
| | - Hélène Delhomelle
- Department of Genetics, Curie Institute, Paris Sciences & Lettres Research University, Paris, France
| | | | - Benjamin P J Fournier
- Centre de Recherche des Cordeliers, University of Paris, Sorbonne University, INSERM UMRS 1138 - Molecular Oral Pathophysiology, Paris, France.,Dental Faculty Garanciere, Oral Biology Department, Centre of Reference for Oral and Dental Rare Diseases, AP-HP, University of Paris, Paris, France
| | - Alain Hovnanian
- INSERM UMR 1163 - Laboratory of Genetic Skin Diseases, Imagine Institute, Paris, France.,University of Paris, Paris, France.,Department of Genetics, Necker Hospital for sick children, AP-HP, Paris, France
| | - Clémentine Legrand
- Service de Génétique, Génomique et Procréation, CHU Grenoble Alpes, Grenoble, France
| | - Alain Lortholary
- Centre Catherine de Sienne, hôpital privé du Confluent, Nantes, France
| | - David Malka
- Department of Cancer Medicine, Gustave Roussy, Paris-Saclay University, INSERM UMR 1279 - Unité Dynamique des Cellules Tumorales, Villejuif, France
| | - Florence Petit
- Clinique de Génétique, CHU Lille, Lille, France.,Univ. Lille, EA7364 - RADEME, CHU Lille, Lille, France
| | | | | | - Chrystelle Colas
- Department of Genetics, Curie Institute, Paris Sciences & Lettres Research University, Paris, France
| | - Marie-Pierre Buisine
- Univ. Lille, CNRS, Inserm, CHU Lille, UMR9020-U1277 CANTHER - Cancer Heterogeneity Plasticity and Resistance to Therapies, Lille, France.,Molecular Oncogenetics, Department of Biochemistry and Molecular Biology, Lille University Hospital, Lille, France
| |
Collapse
|
5
|
Abstract
Biology has become a data driven science largely due to the technological advances that have generated large volumes of data. To extract meaningful information from these data sets requires the use of sophisticated modeling approaches. Toward that, artificial neural network (ANN) based modeling is increasingly playing a very important role. The "black box" nature of ANNs acts as a barrier in providing biological interpretation of the model. Here, the basic steps toward building models for biological systems and interpreting them using calliper randomization approach to capture complex information are described.
Collapse
|
6
|
Williams JL, Paudyal A, Awad S, Nicholson J, Grzesik D, Botta J, Meimaridou E, Maharaj AV, Stewart M, Tinker A, Cox RD, Metherell LA. Mylk3 null C57BL/6N mice develop cardiomyopathy, whereas Nnt null C57BL/6J mice do not. Life Sci Alliance 2020; 3:3/4/e201900593. [PMID: 32213617 PMCID: PMC7103425 DOI: 10.26508/lsa.201900593] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Revised: 03/10/2020] [Accepted: 03/10/2020] [Indexed: 12/30/2022] Open
Abstract
The C57BL/6J and C57BL/6N mice have well-documented phenotypic and genotypic differences, including the infamous nicotinamide nucleotide transhydrogenase (Nnt) null mutation in the C57BL/6J substrain, which has been linked to cardiovascular traits in mice and cardiomyopathy in humans. To assess whether Nnt loss alone causes a cardiovascular phenotype, we investigated the C57BL/6N, C57BL/6J mice and a C57BL/6J-BAC transgenic rescuing NNT expression, at 3, 12, and 18 mo. We identified a modest dilated cardiomyopathy in the C57BL/6N mice, absent in the two B6J substrains. Immunofluorescent staining of cardiomyocytes revealed eccentric hypertrophy in these mice, with defects in sarcomere organisation. RNAseq analysis identified differential expression of a number of cardiac remodelling genes commonly associated with cardiac disease segregating with the phenotype. Variant calling from RNAseq data identified a myosin light chain kinase 3 (Mylk3) mutation in C57BL/6N mice, which abolishes MYLK3 protein expression. These results indicate the C57BL/6J Nnt-null mice do not develop cardiomyopathy; however, we identified a null mutation in Mylk3 as a credible cause of the cardiomyopathy phenotype in the C57BL/6N.
Collapse
Affiliation(s)
- Jack L Williams
- Centre for Endocrinology, William Harvey Research Institute, Charterhouse Square, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Anju Paudyal
- Medical Research Council Harwell Institute, Mary Lyon Centre, Harwell Campus, Oxfordshire, UK
| | - Sherine Awad
- Centre for Endocrinology, William Harvey Research Institute, Charterhouse Square, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - James Nicholson
- Centre for Endocrinology, William Harvey Research Institute, Charterhouse Square, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Dominika Grzesik
- Centre for Endocrinology, William Harvey Research Institute, Charterhouse Square, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Joaquin Botta
- Centre for Endocrinology, William Harvey Research Institute, Charterhouse Square, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Eirini Meimaridou
- School of Human Sciences, London Metropolitan University, London, UK
| | - Avinaash V Maharaj
- Centre for Endocrinology, William Harvey Research Institute, Charterhouse Square, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Michelle Stewart
- Medical Research Council Harwell Institute, Mary Lyon Centre, Harwell Campus, Oxfordshire, UK
| | - Andrew Tinker
- William Harvey Heart Centre, William Harvey Research Institute, Charterhouse Square, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Roger D Cox
- Medical Research Council Harwell Institute, Mammalian Genetics Unit, Harwell Campus, Oxfordshire, UK
| | - Lou A Metherell
- Centre for Endocrinology, William Harvey Research Institute, Charterhouse Square, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| |
Collapse
|
7
|
Ectopic expression of the Stabilin2 gene triggered by an intracisternal A particle (IAP) element in DBA/2J strain of mice. Mamm Genome 2020; 31:2-16. [PMID: 31912264 PMCID: PMC7060167 DOI: 10.1007/s00335-019-09824-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Accepted: 12/29/2019] [Indexed: 12/21/2022]
Abstract
Stabilin2 (Stab2) encodes a large transmembrane protein which is predominantly expressed in the liver sinusoidal endothelial cells (LSECs) and functions as a scavenger receptor for various macromolecules including hyaluronans (HA). In DBA/2J mice, plasma HA concentration is ten times higher than in 129S6 or C57BL/6J mice, and this phenotype is genetically linked to the Stab2 locus. Stab2 mRNA in the LSECs was significantly lower in DBA/2J than in 129S6, leading to reduced STAB2 proteins in the DBA/2J LSECs. We found a retrovirus-derived transposable element, intracisternal A particle (IAP), in the promoter region of Stab2DBA which likely interferes with normal expression in the LSECs. In contrast, in other tissues of DBA/2J mice, the IAP drives high ectopic Stab2DBA transcription starting within the 5′ long terminal repeat of IAP in a reverse orientation and continuing through the downstream Stab2DBA. Ectopic transcription requires the Stab2-IAP element but is dominantly suppressed by the presence of loci on 59.7–73.0 Mb of chromosome (Chr) 13 from C57BL/6J, while the same region in 129S6 requires additional loci for complete suppression. Chr13:59.9–73 Mb contains a large number of genes encoding Krüppel-associated box-domain zinc-finger proteins that target transposable elements-derived sequences and repress their expression. Despite the high amount of ectopic Stab2DBA transcript in tissues other than liver, STAB2 protein was undetectable and unlikely to contribute to the plasma HA levels of DBA/2J mice. Nevertheless, the IAP insertion and its effects on the transcription of the downstream Stab2DBA exemplify that stochastic evolutional events could significantly influence susceptibility to complex but common diseases.
Collapse
|
8
|
Zhang Y, Xie R, Wang J, Leier A, Marquez-Lago TT, Akutsu T, Webb GI, Chou KC, Song J. Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework. Brief Bioinform 2019; 20:2185-2199. [PMID: 30351377 PMCID: PMC6954445 DOI: 10.1093/bib/bby079] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 07/28/2018] [Accepted: 08/01/2018] [Indexed: 11/15/2022] Open
Abstract
As a newly discovered post-translational modification (PTM), lysine malonylation (Kmal) regulates a myriad of cellular processes from prokaryotes to eukaryotes and has important implications in human diseases. Despite its functional significance, computational methods to accurately identify malonylation sites are still lacking and urgently needed. In particular, there is currently no comprehensive analysis and assessment of different features and machine learning (ML) methods that are required for constructing the necessary prediction models. Here, we review, analyze and compare 11 different feature encoding methods, with the goal of extracting key patterns and characteristics from residue sequences of Kmal sites. We identify optimized feature sets, with which four commonly used ML methods (random forest, support vector machines, K-nearest neighbor and logistic regression) and one recently proposed [Light Gradient Boosting Machine (LightGBM)] are trained on data from three species, namely, Escherichia coli, Mus musculus and Homo sapiens, and compared using randomized 10-fold cross-validation tests. We show that integration of the single method-based models through ensemble learning further improves the prediction performance and model robustness on the independent test. When compared to the existing state-of-the-art predictor, MaloPred, the optimal ensemble models were more accurate for all three species (AUC: 0.930, 0.923 and 0.944 for E. coli, M. musculus and H. sapiens, respectively). Using the ensemble models, we developed an accessible online predictor, kmal-sp, available at http://kmalsp.erc.monash.edu/. We hope that this comprehensive survey and the proposed strategy for building more accurate models can serve as a useful guide for inspiring future developments of computational methods for PTM site prediction, expedite the discovery of new malonylation and other PTM types and facilitate hypothesis-driven experimental validation of novel malonylated substrates and malonylation sites.
Collapse
Affiliation(s)
- Yanju Zhang
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Ruopeng Xie
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Jiawei Wang
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, VIC 3800, Australia
| | - André Leier
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Tatiana T Marquez-Lago
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, VIC 3800, Australia
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, USA
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Jiangning Song
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, VIC 3800, Australia
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, VIC 3800, Australia
| |
Collapse
|
9
|
Meneghetti G, Skobo T, Chrisam M, Facchinello N, Fontana CM, Bellesso S, Sabatelli P, Raggi F, Cecconi F, Bonaldo P, Dalla Valle L. The epg5 knockout zebrafish line: a model to study Vici syndrome. Autophagy 2019; 15:1438-1454. [PMID: 30806141 DOI: 10.1080/15548627.2019.1586247] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
The EPG5 protein is a RAB7A effector involved in fusion specificity between autophagosomes and late endosomes or lysosomes during macroautophagy/autophagy. Mutations in the human EPG5 gene cause a rare and severe multisystem disorder called Vici syndrome. In this work, we show that zebrafish epg5-/- mutants from both heterozygous and incrossed homozygous matings are viable and can develop to the age of sexual maturity without conspicuous defects in external appearance. In agreement with the dysfunctional autophagy of Vici syndrome, western blot revealed higher levels of the Lc3-II autophagy marker in epg5-/- mutants with respect to wild type controls. Moreover, starvation elicited higher accumulation of Lc3-II in epg5-/- than in wild type larvae, together with a significant reduction of skeletal muscle birefringence. Accordingly, muscle ultrastructural analysis revealed accumulation of degradation-defective autolysosomes in starved epg5-/- mutants. By aging, epg5-/- mutants showed impaired motility and muscle thinning, together with accumulation of non-degradative autophagic vacuoles. Furthermore, epg5-/- adults displayed morphological alterations in gonads and heart. These findings point at the zebrafish epg5 mutant as a valuable model for EPG5-related disorders, thus providing a new tool for dissecting the contribution of EPG5 on the onset and progression of Vici syndrome as well as for the screening of autophagy-stimulating drugs. Abbreviations: ATG: autophagy related; cDNA: complementary DNA; DIG: digoxigenin; dpf: days post-fertilization; EGFP: enhanced green fluorescent protein; EPG: ectopic P granules; GFP: green fluorescent protein; hpf: hours post-fertilization; IL1B: interleukin 1 beta; Lc3-II: lipidated Lc3; mpf: months post-fertilization; mRNA: messenger RNA; NMD: nonsense-mediated mRNA decay; PCR: polymerase chain reaction; qPCR: real time-polymerase chain reaction; RAB7A/RAB7: RAB7a, member RAS oncogene family; RACE: rapid amplification of cDNA ends; RFP: red fluorescent protein; RT-PCR: reverse transcriptase-polymerase chain reaction; SEM: standard error of the mean; sgRNA: guide RNA; UTR: untranslated region; WMISH: whole mount in situ hybridization; WT: wild type.
Collapse
Affiliation(s)
| | - Tatjana Skobo
- a Department of Biology , University of Padova , Padova , Italy
| | - Martina Chrisam
- b Department of Molecular Medicine , University of Padova , Padova , Italy
| | | | | | - Stefania Bellesso
- b Department of Molecular Medicine , University of Padova , Padova , Italy
| | - Patrizia Sabatelli
- c Institute of Molecular Genetics , National Research Council of Italy , Bologna , Italy.,d IRCCS-Rizzoli Orthopedic Institute , Bologna , Italy
| | - Flavia Raggi
- a Department of Biology , University of Padova , Padova , Italy
| | - Francesco Cecconi
- e Department of Biology , University of Rome Tor Vergata , Roma , Italy.,f Department of Pediatric Hematology and Oncology , Istituto di Ricovero e Cura a Carattere Scientifico Bambino Gesù Children's Hospital , Rome , Italy.,g Unit of Cell Stress and Survival , Danish Cancer Society Research Center , Copenhagen , Denmark
| | - Paolo Bonaldo
- b Department of Molecular Medicine , University of Padova , Padova , Italy
| | | |
Collapse
|
10
|
Fu X, Zhu W, Cai L, Liao B, Peng L, Chen Y, Yang J. Improved Pre-miRNAs Identification Through Mutual Information of Pre-miRNA Sequences and Structures. Front Genet 2019; 10:119. [PMID: 30858864 PMCID: PMC6397858 DOI: 10.3389/fgene.2019.00119] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Accepted: 02/04/2019] [Indexed: 11/30/2022] Open
Abstract
Playing critical roles as post-transcriptional regulators, microRNAs (miRNAs) are a family of short non-coding RNAs that are derived from longer transcripts called precursor miRNAs (pre-miRNAs). Experimental methods to identify pre-miRNAs are expensive and time-consuming, which presents the need for computational alternatives. In recent years, the accuracy of computational methods to predict pre-miRNAs has been increasing significantly. However, there are still several drawbacks. First, these methods usually only consider base frequencies or sequence information while ignoring the information between bases. Second, feature extraction methods based on secondary structures usually only consider the global characteristics while ignoring the mutual influence of the local structures. Third, methods integrating high-dimensional feature information is computationally inefficient. In this study, we have proposed a novel mutual information-based feature representation algorithm for pre-miRNA sequences and secondary structures, which is capable of catching the interactions between sequence bases and local features of the RNA secondary structure. In addition, the feature space is smaller than that of most popular methods, which makes our method computationally more efficient than the competitors. Finally, we applied these features to train a support vector machine model to predict pre-miRNAs and compared the results with other popular predictors. As a result, our method outperforms others based on both 5-fold cross-validation and the Jackknife test.
Collapse
Affiliation(s)
- Xiangzheng Fu
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Wen Zhu
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Lijun Cai
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Bo Liao
- College of Information Science and Engineering, Hunan University, Changsha, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Yifan Chen
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Jialiang Yang
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| |
Collapse
|
11
|
Biochemical and cellular consequences of the antithrombin p.Met1? mutation identified in a severe thrombophilic family. Oncotarget 2018; 9:33202-33214. [PMID: 30237862 PMCID: PMC6145704 DOI: 10.18632/oncotarget.26059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Accepted: 07/31/2018] [Indexed: 11/25/2022] Open
Abstract
Nature is always the best inspiration for basic research. A family with severe thrombosis and antithrombin deficiency, the strongest anticoagulant, carried a new mutation affecting the translation-start codon of SERPINC1, the gene encoding antithrombin. Expression of this variant in a eukaryotic cell system produced three different antithrombins. Two downstream methionines were used as alternative initiation codons, generating highly expressed small aglycosylated antithrombins with cytoplasmic localization. Wild-type antithrombin was generated by the use of the mutated AUU as initiation codon. Actually, any codon except for the three stop codons might be used to initiate translation in this strong Kozak context. We show unexpected consequences of natural mutations affecting translation-start codons. Downstream alternative initiation AUG codons may be used when the start codon is mutated, generating smaller molecules with potential different cell localization, biochemical features and unexplored consequences. Additionally, our data further support the use of other codons apart from AUG for initiation of translation in eukaryotes.
Collapse
|
12
|
Matsuo M, Awano H, Matsumoto M, Nagai M, Kawaguchi T, Zhang Z, Nishio H. Dystrophin Dp116: A yet to Be Investigated Product of the Duchenne Muscular Dystrophy Gene. Genes (Basel) 2017; 8:genes8100251. [PMID: 28974057 PMCID: PMC5664101 DOI: 10.3390/genes8100251] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Accepted: 09/26/2017] [Indexed: 12/12/2022] Open
Abstract
The Duchenne muscular dystrophy (DMD) gene is one of the largest genes in the human genome. The gene exhibits a complex arrangement of seven alternative promoters, which drive the expression of three full length and four shorter isoforms. Dp116, the second smallest product of the DMD gene, is a Schwann cell-specific isoform encoded by a transcript corresponding to DMD exons 56–79, starting from a promoter/exon S1 within intron 55. The physiological roles of Dp116 are poorly understood, because of its extensive homology with other isoforms and its expression in specific tissues. This review summarizes studies on Dp116, focusing on clinical findings and alternative activation of the upstream translation initiation codon that is predicted to produce Dp118.
Collapse
Affiliation(s)
- Masafumi Matsuo
- Department of Physical Therapy, Faculty of Rehabilitation, Kobe Gakuin University, Kobe 651-2180, Japan.
| | - Hiroyuki Awano
- Department of Pediatrics, Kobe University Graduate School of Medicine, Kobe 650-0017, Japan.
| | - Masaaki Matsumoto
- Department of Pediatrics, Kobe University Graduate School of Medicine, Kobe 650-0017, Japan.
| | - Masashi Nagai
- Department of Pediatrics, Kobe University Graduate School of Medicine, Kobe 650-0017, Japan.
| | - Tatsuya Kawaguchi
- Biomedical Analysis and Pathology Research Group, Discovery Science and Technology Department, Daiichi Sankyo RD Novare Co., Tokyo 134-8630, Japan.
| | - Zhujun Zhang
- Department of Physical Therapy, Faculty of Rehabilitation, Kobe Gakuin University, Kobe 651-2180, Japan.
| | - Hisahide Nishio
- Department of Community Medicine and Social Healthcare Sciences, Kobe University Graduate School of Medicine, Kobe 650-0017, Japan.
| |
Collapse
|
13
|
Abstract
Background More than 1/3 of human genes are regulated by microRNAs. The identification of microRNA (miRNA) is the precondition of discovering the regulatory mechanism of miRNA and developing the cure for genetic diseases. The traditional identification method is biological experiment, but it has the defects of long period, high cost, and missing the miRNAs that but also many other algorithms only exist in a specific period or low expression level. Therefore, to overcome these defects, machine learning method is applied to identify miRNAs. Results In this study, for identifying real and pseudo miRNAs and classifying different species, we extracted 98 dimensional features based on the primary and secondary structure, then we proposed the BP-Adaboost method to figure out the overfitting phenomenon of BP neural network by constructing multiple BP neural network classifiers and distributed weights to these classifiers. The novel method we proposed, from the 4 evaluation terms, have achieved greatly improvement on the effect of identifying true pre-RNA compared to other methods. And from the respect of identifying species of pre-RNA, the novel method achieved more accuracy than other algorithms. Conclusions The BP-Adaboost method has achieved more than 98% accuracy in identifying real and pseudo miRNAs. It is much higher than not only BP but also many other algorithms. In the second experiment, restricted by the data, the algorithm could not get high accuracy in identifying 7 species, but also better than other algorithms.
Collapse
|
14
|
Roeben B, Schüle R, Ruf S, Bender B, Alhaddad B, Benkert T, Meitinger T, Reich S, Böhringer J, Langhans CD, Vaz FM, Wortmann SB, Marquardt T, Haack TB, Krägeloh-Mann I, Schöls L, Synofzik M. SERAC1 deficiency causes complicated HSP: evidence from a novel splice mutation in a large family. J Med Genet 2017; 55:39-47. [DOI: 10.1136/jmedgenet-2017-104622] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Revised: 08/24/2017] [Accepted: 08/31/2017] [Indexed: 12/21/2022]
Abstract
ObjectiveTo demonstrate that mutations in the phosphatidylglycerol remodelling enzyme SERAC1 can cause juvenile-onset complicated hereditary spastic paraplegia (cHSP) clusters, thus adding SERAC1 to the increasing number of complex lipid cHSP genes.MethodsCombined genomic and functional validation studies (whole-exome sequencing, mRNA, cDNA and protein), biomarker investigations (3-methyl-glutaconic acid, filipin staining and phosphatidylglycerols PG34:1/PG36:1), and clinical and imaging phenotyping were performed in six affected subjects from two different branches of a large consanguineous family.Results5 of 6 affected subjects shared cHSP as a common disease phenotype. Three subjects presented with juvenile-onset oligosystemic cHSP, still able to walk several miles at age >10–20 years. This benign phenotypic cluster and disease progression is strikingly divergent to the severe infantile phenotype of all SERAC1 cases reported so far. Two family members showed a more multisystemic juvenile-onset cHSP, indicating an intermediate phenotype between the benign oligosystemic cHSP and the classic infantile SERAC1 cluster. The homozygous splice mutation led to loss of the full-length SERAC1 protein and impaired phosphatidylglycerol PG34:1/PG36:1 remodelling. These phosphatidylglycerol changes, however, were milder than in classic infantile-onset SERAC1 cases, which might partially explain the milder SERAC1 phenotype.ConclusionsOur findings add SERAC1 to the increasing list of complex lipid cHSP genes. At the same time they redefine the phenotypic spectrum of SERAC1 deficiency. It is associated not only with the severe infantile-onset ‘Methylglutaconic aciduria, Deafness, Encephalopathy, Leigh-like’ syndrome (MEGDEL syndrome), but also with oligosystemic juvenile-onset cHSP as part of the now unfolding SERAC1 deficiency spectrum.
Collapse
|
15
|
Nunes Pinto CL, Nobre CN, Zárate LE. Transductive learning as an alternative to translation initiation site identification. BMC Bioinformatics 2017; 18:81. [PMID: 28152994 PMCID: PMC5290616 DOI: 10.1186/s12859-017-1502-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2016] [Accepted: 01/28/2017] [Indexed: 11/23/2022] Open
Abstract
Background The correct protein coding region identification is an important and latent problem in the molecular biology field. This problem becomes a challenge due to the lack of deep knowledge about the biological systems and unfamiliarity of conservative characteristics in the messenger RNA (mRNA). Therefore, it is fundamental to research for computational methods aiming to help the patterns discovery for identification of the Translation Initiation Sites (TIS). In the field of Bioinformatics, machine learning methods have been widely applied based on the inductive inference, as Inductive Support Vector Machine (ISVM). On the other hand, not so much attention has been given to transductive inference-based machine learning methods such as Transductive Support Vector Machine (TSVM). The transductive inference performs well for problems in which the amount of unlabeled sequences is considerably greater than the labeled ones. Similarly, the problem of predicting the TIS may take advantage of transductive methods due to the fact that the amount of new sequences grows rapidly with the progress of Genome Project that allows the study of new organisms. Consequently, this work aims to investigate the transductive learning towards TIS identification and compare the results with those obtained in inductive method. Results The transductive inference presents better results both in F-measure and in sensitivity in comparison with the inductive method for predicting the TIS. Additionally, it presents the least failure rate for identifying the TIS, presenting a smaller number of False Negatives (FN) than the ISVM. The ISVM and TSVM methods were validated with the molecules from the most representative organisms contained in the RefSeq database: Rattus norvegicus, Mus musculus, Homo sapiens, Drosophila melanogaster and Arabidopsis thaliana. The transductive method presented F-measure and sensitivity higher than 90% and also higher than the results obtained with ISVM. The ISVM and TSVM approaches were implemented in the TransduTIS tool, TransduTIS-I and TransduTIS-T respectively, available in a web interface. These approaches were compared with the TISHunter, TIS Miner, NetStart tools, presenting satisfactory results. Conclusions In relation to precision, the results are similar for the ISVM and TSVM classifiers. However, the results show that the application of TSVM approach ensured an improvement, specially for F-measure and sensitivity. Moreover, it was possible to identify a potential for the application of TSVM, which is for organisms in the initial study phase with few identified sequences in the databases. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1502-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Cristiane Neri Nobre
- Pontifical Catholic University of Minas Gerais - PUC-MG, 255, Walter Ianni Street, Belo Horizonte, 31980-110, Brazil
| | - Luis Enrique Zárate
- Pontifical Catholic University of Minas Gerais - PUC-MG, 255, Walter Ianni Street, Belo Horizonte, 31980-110, Brazil
| |
Collapse
|
16
|
Mal-Lys: prediction of lysine malonylation sites in proteins integrated sequence-based features with mRMR feature selection. Sci Rep 2016; 6:38318. [PMID: 27910954 PMCID: PMC5133563 DOI: 10.1038/srep38318] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2015] [Accepted: 11/08/2016] [Indexed: 12/25/2022] Open
Abstract
Lysine malonylation is an important post-translational modification (PTM) in proteins, and has been characterized to be associated with diseases. However, identifying malonyllysine sites still remains to be a great challenge due to the labor-intensive and time-consuming experiments. In view of this situation, the establishment of a useful computational method and the development of an efficient predictor are highly desired. In this study, a predictor Mal-Lys which incorporated residue sequence order information, position-specific amino acid propensity and physicochemical properties was proposed. A feature selection method of minimum Redundancy Maximum Relevance (mRMR) was used to select optimal ones from the whole features. With the leave-one-out validation, the value of the area under the curve (AUC) was calculated as 0.8143, whereas 6-, 8- and 10-fold cross-validations had similar AUC values which showed the robustness of the predictor Mal-Lys. The predictor also showed satisfying performance in the experimental data from the UniProt database. Meanwhile, a user-friendly web-server for Mal-Lys is accessible at http://app.aporc.org/Mal-Lys/.
Collapse
|
17
|
Reuter K, Biehl A, Koch L, Helms V. PreTIS: A Tool to Predict Non-canonical 5' UTR Translational Initiation Sites in Human and Mouse. PLoS Comput Biol 2016; 12:e1005170. [PMID: 27768687 PMCID: PMC5074520 DOI: 10.1371/journal.pcbi.1005170] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Accepted: 09/27/2016] [Indexed: 02/03/2023] Open
Abstract
Translation of mRNA sequences into proteins typically starts at an AUG triplet. In rare cases, translation may also start at alternative non-AUG codons located in the annotated 5' UTR which leads to an increased regulatory complexity. Since ribosome profiling detects translational start sites at the nucleotide level, the properties of these start sites can then be used for the statistical evaluation of functional open reading frames. We developed a linear regression approach to predict in-frame and out-of-frame translational start sites within the 5' UTR from mRNA sequence information together with their translation initiation confidence. Predicted start codons comprise AUG as well as near-cognate codons. The underlying datasets are based on published translational start sites for human HEK293 and mouse embryonic stem cells that were derived by the original authors from ribosome profiling data. The average prediction accuracy of true vs. false start sites for HEK293 cells was 80%. When applied to mouse mRNA sequences, the same model predicted translation initiation sites observed in mouse ES cells with an accuracy of 76%. Moreover, we illustrate the effect of in silico mutations in the flanking sequence context of a start site on the predicted initiation confidence. Our new webservice PreTIS visualizes alternative start sites and their respective ORFs and predicts their ability to initiate translation. Solely, the mRNA sequence is required as input. PreTIS is accessible at http://service.bioinformatik.uni-saarland.de/pretis.
Collapse
Affiliation(s)
- Kerstin Reuter
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
- Saarbrücken Graduate School of Computer Science, Saarland University, Saarbrücken, Germany
| | - Alexander Biehl
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| | - Laurena Koch
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| | - Volkhard Helms
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
- * E-mail:
| |
Collapse
|
18
|
BP Neural Network Could Help Improve Pre-miRNA Identification in Various Species. BIOMED RESEARCH INTERNATIONAL 2016; 2016:9565689. [PMID: 27635401 PMCID: PMC5011242 DOI: 10.1155/2016/9565689] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2016] [Revised: 07/05/2016] [Accepted: 07/17/2016] [Indexed: 01/21/2023]
Abstract
MicroRNAs (miRNAs) are a set of short (21–24 nt) noncoding RNAs that play significant regulatory roles in cells. In the past few years, research on miRNA-related problems has become a hot field of bioinformatics because of miRNAs' essential biological function. miRNA-related bioinformatics analysis is beneficial in several aspects, including the functions of miRNAs and other genes, the regulatory network between miRNAs and their target mRNAs, and even biological evolution. Distinguishing miRNA precursors from other hairpin-like sequences is important and is an essential procedure in detecting novel microRNAs. In this study, we employed backpropagation (BP) neural network together with 98-dimensional novel features for microRNA precursor identification. Results show that the precision and recall of our method are 95.53% and 96.67%, respectively. Results further demonstrate that the total prediction accuracy of our method is nearly 13.17% greater than the state-of-the-art microRNA precursor prediction software tools.
Collapse
|
19
|
Liu Y, Wan X. Information bottleneck based incremental fuzzy clustering for large biomedical data. J Biomed Inform 2016; 62:48-58. [PMID: 27260783 DOI: 10.1016/j.jbi.2016.05.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Revised: 04/24/2016] [Accepted: 05/30/2016] [Indexed: 10/21/2022]
Abstract
Incremental fuzzy clustering combines advantages of fuzzy clustering and incremental clustering, and therefore is important in classifying large biomedical literature. Conventional algorithms, suffering from data sparsity and high-dimensionality, often fail to produce reasonable results and may even assign all the objects to a single cluster. In this paper, we propose two incremental algorithms based on information bottleneck, Single-Pass fuzzy c-means (spFCM-IB) and Online fuzzy c-means (oFCM-IB). These two algorithms modify conventional algorithms by considering different weights for each centroid and object and scoring mutual information loss to measure the distance between centroids and objects. spFCM-IB and oFCM-IB are used to group a collection of biomedical text abstracts from Medline database. Experimental results show that clustering performances of our approaches are better than such prominent counterparts as spFCM, spHFCM, oFCM and oHFCM, in terms of accuracy.
Collapse
Affiliation(s)
- Yongli Liu
- School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, Henan 454000, China.
| | - Xing Wan
- School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, Henan 454000, China
| |
Collapse
|
20
|
Magana-Mora A, Ashoor H, Jankovic BR, Kamau A, Awara K, Chowdhary R, Archer JAC, Bajic VB. Dragon TIS Spotter: an Arabidopsis-derived predictor of translation initiation sites in plants. ACTA ACUST UNITED AC 2012; 29:117-8. [PMID: 23110968 PMCID: PMC3530916 DOI: 10.1093/bioinformatics/bts638] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Summary: In higher eukaryotes, the identification of translation initiation
sites (TISs) has been focused on finding these signals in cDNA or mRNA sequences. Using
Arabidopsis thaliana (A.t.) information, we developed
a prediction tool for signals within genomic sequences of plants that correspond to TISs.
Our tool requires only genome sequence, not expressed sequences. Its
sensitivity/specificity is for A.t. (90.75%/92.2%), for
Vitis vinifera (66.8%/94.4%) and for Populus
trichocarpa (81.6%/94.4%), which suggests that our tool can be
used in annotation of different plant genomes. We provide a list of features used in our
model. Further study of these features may improve our understanding of mechanisms of the
translation initiation. Availability and implementation: Our tool is implemented as an artificial
neural network. It is available as a web-based tool and, together with the source code,
the list of features, and data used for model development, is accessible at http://cbrc.kaust.edu.sa/dts. Contact:vladimir.bajic@kaust.edu.sa Supplementary information: Supplementary data are available at Bioinformatics
online.
Collapse
Affiliation(s)
- Arturo Magana-Mora
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia
| | | | | | | | | | | | | | | |
Collapse
|
21
|
Tzanis G, Berberidis C, Vlahavas I. StackTIS: A stacked generalization approach for effective prediction of translation initiation sites. Comput Biol Med 2012; 42:61-9. [DOI: 10.1016/j.compbiomed.2011.10.009] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2009] [Accepted: 10/20/2011] [Indexed: 10/15/2022]
|
22
|
KOH CHUANHOCK, LIN SHARENE, JEDD GREGORY, WONG LIMSOON. SIRIUS PSB: A GENERIC SYSTEM FOR ANALYSIS OF BIOLOGICAL SEQUENCES. J Bioinform Comput Biol 2011; 7:973-90. [DOI: 10.1142/s0219720009004436] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2009] [Revised: 08/15/2009] [Accepted: 08/15/2009] [Indexed: 11/18/2022]
Abstract
Computational tools are essential components of modern biological research. For example, BLAST searches can be used to identify related proteins based on sequence homology, or when a new genome is sequenced, prediction models can be used to annotate functional sites such as transcription start sites, translation initiation sites and polyadenylation sites and to predict protein localization. Here we present Sirius Prediction Systems Builder (PSB), a new computational tool for sequence analysis, classification and searching. Sirius PSB has four main operations: (1) Building a classifier, (2) Deploying a classifier, (3) Search for proteins similar to query proteins, (4) Preliminary and post-prediction analysis. Sirius PSB supports all these operations via a simple and interactive graphical user interface. Besides being a convenient tool, Sirius PSB has also introduced two novelties in sequence analysis. Firstly, genetic algorithm is used to identify interesting features in the feature space. Secondly, instead of the conventional method of searching for similar proteins via sequence similarity, we introduced searching via features' similarity. To demonstrate the capabilities of Sirius PSB, we have built two prediction models — one for the recognition of Arabidopsis polyadenylation sites and another for the subcellular localization of proteins. Both systems are competitive against current state-of-the-art models based on evaluation of public datasets. More notably, the time and effort required to build each model is greatly reduced with the assistance of Sirius PSB. Furthermore, we show that under certain conditions when BLAST is unable to find related proteins, Sirius PSB can identify functionally related proteins based on their biophysical similarities. Sirius PSB and its related supplements are available at:
Collapse
Affiliation(s)
- CHUAN HOCK KOH
- School of Computing, National University of Singapore, COM1, Computing Drive, 117417, Singapore
- NUS Graduate School for Integrative Sciences and Engineering, 117597, Singapore
| | - SHARENE LIN
- School of Computing, National University of Singapore, COM1, Computing Drive, 117417, Singapore
| | - GREGORY JEDD
- Temasek Life Sciences Laboratory and Department of Biological Sciences, National University of Singapore, 117604, Singapore
| | - LIMSOON WONG
- School of Computing, National University of Singapore, COM1, Computing Drive, 117417, Singapore
| |
Collapse
|
23
|
Einarsdóttir K, Preen DB, Emery JD, Holman CDJ. Regular primary care plays a significant role in secondary prevention of ischemic heart disease in a Western Australian cohort. J Gen Intern Med 2011; 26:1092-7. [PMID: 21347875 PMCID: PMC3181311 DOI: 10.1007/s11606-011-1665-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/26/2010] [Revised: 12/22/2010] [Accepted: 02/07/2011] [Indexed: 10/18/2022]
Abstract
BACKGROUND Secondary prevention for established ischaemic heart disease (IHD) involves medication therapy and a healthier lifestyle, but adherence is suboptimal. Simply having scheduled regular appointments with a primary care physician could confer a benefit for IHD patients possibly through increased motivation and awareness, but this has not previously been investigated in the literature. OBJECTIVE To estimate the association between regular general practitioner (GP) visitation and rates of all-cause death, IHD death or repeat hospitalisation for IHD in older patients in Western Australia (WA). DESIGN A retrospective cohort design. PARTICIPANTS Patients aged ≥ 65 years (n = 31,841) with a history of hospitalisation for IHD from 1992-2006 were ascertained through routine health data collected on the entire WA population and included in the analysis. MAIN MEASURES Frequency and regularity of GP visits was determined during a three-year exposure period at commencement of follow-up. A regularity score (range 0-1) measured the regularity of intervals between the GP visits and was divided into quartiles. Patients were then followed for a maximum of 11.5 years for outcome determination. Hazard ratios and 95% confidence intervals were calculated using Cox proportional hazards models. KEY RESULTS Compared with the least regular quartile, patients with greater GP visit regularity had significantly decreased risks of all-cause death (2(nd) least, 2(nd) most and most regular: HR = 0.76, 0.71 and 0.71); and IHD death (2(nd) least, 2(nd) most and most regular: HR = 0.70, 0.68 and 0.65). Patients in the 2(nd) least regular quartile also appeared to experience decreased risk of any repeat IHD hospitalisation (HR = 0.83, 95%CI 0.71-0.96) as well as emergency hospitalisation (HR = 0.81, 95%CI 0.67-0.98), compared with the least regular quartile. CONCLUSIONS Some degree of regular GP visitation offers a small but significant protection against morbidity and mortality in older people with established IHD. The findings indicate the importance of scheduled, regular GP visits for the secondary prevention of IHD.
Collapse
|
24
|
Wong L. New results in biological sequence analysis, complex gene–disease association, qPCR calculation, and biological text mining. J Bioinform Comput Biol 2010; 8:v-ii. [PMID: 21046831 DOI: 10.1142/s0219720010005233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
25
|
Einarsdóttir K, Preen DB, Emery JD, Kelman C, Holman CDJ. Regular primary care lowers hospitalisation risk and mortality in seniors with chronic respiratory diseases. J Gen Intern Med 2010; 25:766-73. [PMID: 20425147 PMCID: PMC2896607 DOI: 10.1007/s11606-010-1361-6] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/27/2009] [Accepted: 03/19/2010] [Indexed: 11/28/2022]
Abstract
BACKGROUND Exacerbations in chronic respiratory diseases (CRDs) are sensitive to seasonal variations in exposure to respiratory infectious agents and allergens and patient factors such as non-adherence. Hence, regular general practitioner (GP) contact is likely to be important in order to recognise symptom escalation early and adjust treatment. OBJECTIVE To examine the association of regularity of GP visits with all-cause mortality and first CRD hospitalisation overall and within groups of pharmacotherapy level in older CRD patients. DESIGN A retrospective cohort design using linked hospital, mortality, Medicare and pharmaceutical data for participant, exposure and outcome ascertainment. GP visit pattern was measured during the first 3 years of the observation period. Patients were then followed for a maximum of 11.5 years for ascertainment of hospitalisations and deaths. PARTICIPANTS We studied 108,455 patients aged >or=65 years with CRD in Western Australia (WA) during 1992-2006. MAIN MEASURES A GP visit regularity score (range 0-1) was calculated and divided into quintiles. A clinician consensus panel classified levels of pharmacotherapy. Cox proportional hazards models, controlling for multiple factors including GP visit frequency, were used to calculate hazard ratios and confidence intervals. KEY RESULTS Differences in survival curves and hospital avoidance pattern between the GP visit regularity quintiles were statistically significant (p = 0.0279 and p < 0.0001, respectively). The protective association between GP visit regularity and death appeared to be confined to the highest pharmacotherapy level group (P for interaction = 0.0001). Higher GP visit regularity protected against first CRD hospitalisation compared with the least regular quintile regardless of pharmacotherapy level (medium regular: HR = 0.84, 95% CI = 0.77-0.92; 2nd most regular: HR = 0.74, 95% CI = 0.67-0.82; most regular HR = 0.77, 95% CI = 0.68-0.86). CONCLUSIONS The findings indicate that regular and proactive 'maintenance' primary care, as distinct from 'reactive' care, is beneficial to older CRD patients by reducing their risks of hospitalisation and death.
Collapse
Affiliation(s)
- Kristjana Einarsdóttir
- Centre for Health Services Research, School of Population Health M431, The University of Western Australia, 35 Stirling Highway, Crawley, 6009, Perth, Australia.
| | | | | | | | | |
Collapse
|
26
|
Regular Primary Care Decreases the Likelihood of Mortality in Older People With Epilepsy. Med Care 2010; 48:472-6. [DOI: 10.1097/mlr.0b013e3181d68994] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
27
|
Yang Y, Wang YP, Li KB. MiRTif: a support vector machine-based microRNA target interaction filter. BMC Bioinformatics 2008; 9 Suppl 12:S4. [PMID: 19091027 PMCID: PMC2638144 DOI: 10.1186/1471-2105-9-s12-s4] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND MicroRNAs (miRNAs) are a set of small non-coding RNAs serving as important negative gene regulators. In animals, miRNAs turn down protein translation by binding to the 3' UTR regions of target genes with imperfect complementary pairing. The identification of microRNA targets has become one of the major challenges of miRNA research. Bioinformatics investigations on miRNA target have resulted in a number of target prediction tools. Although these tools are capable of predicting hundreds of targets for a given miRNA, many of them suffer from high false positive rates, indicating the need for a post-processing filter for the predicted targets. Once trained with experimentally validated true and false targets, machine learning methods appear to be ideal approaches to distinguish the true targets from the false ones. RESULTS We present a miRNA target filtering system named MiRTif (miRNA:target interaction filter). The system is a support vector machine (SVM) classifier trained with 195 positive and 38 negative miRNA:target interaction pairs, all experimentally validated. Each miRNA:target interaction pair is divided into a seed and a non-seed region. The encoded feature vector contains various k-gram frequencies in the seed, the non-seed and the entire regions. Informative features are selected based on their discriminating abilities. Prediction accuracies are assessed using 10-fold cross-validation experiments. Our system achieves AUC (area under the ROC curve) of 0.86, sensitivity of 83.59%, and specificity of 73.68%. More importantly, the system correctly predicts majority of the false positive miRNA:target interactions (28 out of 38). The possibility of over-fitting due to the relatively small negative sample set has also been investigated using a set of non-validated and randomly selected targets (from miRBase). CONCLUSION MiRTif is designed as a post-processing filter that takes miRNA:target interactions predicted by other target prediction softwares such as TargetScanS, PicTar and miRanda as inputs, and determines how likely the given interaction is a real or a pseudo one. MiRTif can be accessed from http://bsal.ym.edu.tw/mirtif.
Collapse
Affiliation(s)
- Yuchen Yang
- Institute of Molecular and Cell Biology, 61 Biopolis Drive, 138673, Singapore.
| | | | | |
Collapse
|
28
|
Baten AKMA, Halgamuge SK, Chang BCH. Fast splice site detection using information content and feature reduction. BMC Bioinformatics 2008; 9 Suppl 12:S8. [PMID: 19091031 PMCID: PMC2638148 DOI: 10.1186/1471-2105-9-s12-s8] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accurate identification of splice sites in DNA sequences plays a key role in the prediction of gene structure in eukaryotes. Already many computational methods have been proposed for the detection of splice sites and some of them showed high prediction accuracy. However, most of these methods are limited in terms of their long computation time when applied to whole genome sequence data. RESULTS In this paper we propose a hybrid algorithm which combines several effective and informative input features with the state of the art support vector machine (SVM). To obtain the input features we employ information content method based on Shannon's information theory, Shapiro's score scheme, and Markovian probabilities. We also use a feature elimination scheme to reduce the less informative features from the input data. CONCLUSION In this study we propose a new feature based splice site detection method that shows improved acceptor and donor splice site detection in DNA sequences when the performance is compared with various state of the art and well known methods.
Collapse
Affiliation(s)
- AKMA Baten
- Biomechanical Engineering Research Group, Department of Mechanical Engineering, Melbourne School of Engineering, The University of Melbourne, Victoria 3010, Australia
| | - SK Halgamuge
- Biomechanical Engineering Research Group, Department of Mechanical Engineering, Melbourne School of Engineering, The University of Melbourne, Victoria 3010, Australia
| | - BCH Chang
- Institute of Plant and Microbial Biology, Academia Sinica, Taiwan
| |
Collapse
|
29
|
|
30
|
Dogan RI, Getoor L, Wilbur WJ, Mount SM. Features generated for computational splice-site prediction correspond to functional elements. BMC Bioinformatics 2007; 8:410. [PMID: 17958908 PMCID: PMC2241647 DOI: 10.1186/1471-2105-8-410] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2007] [Accepted: 10/24/2007] [Indexed: 11/16/2022] Open
Abstract
Background Accurate selection of splice sites during the splicing of precursors to messenger RNA requires both relatively well-characterized signals at the splice sites and auxiliary signals in the adjacent exons and introns. We previously described a feature generation algorithm (FGA) that is capable of achieving high classification accuracy on human 3' splice sites. In this paper, we extend the splice-site prediction to 5' splice sites and explore the generated features for biologically meaningful splicing signals. Results We present examples from the observed features that correspond to known signals, both core signals (including the branch site and pyrimidine tract) and auxiliary signals (including GGG triplets and exon splicing enhancers). We present evidence that features identified by FGA include splicing signals not found by other methods. Conclusion Our generated features capture known biological signals in the expected sequence interval flanking splice sites. The method can be easily applied to other species and to similar classification problems, such as tissue-specific regulatory elements, polyadenylation sites, promoters, etc.
Collapse
|
31
|
Tzanis G, Berberidis C, Vlahavas I. MANTIS: A Data Mining Methodology for Effective Translation Initiation Site Prediction. ACTA ACUST UNITED AC 2007; 2007:6344-8. [DOI: 10.1109/iembs.2007.4353806] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
32
|
Towsey MW, Gordon JJ, Hogan JM. The prediction of bacterial transcription start sites using SVMs. Int J Neural Syst 2007; 16:363-70. [PMID: 17117497 DOI: 10.1142/s0129065706000767] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Identifying promoters is the key to understanding gene expression in bacteria. Promoters lie in tightly constrained positions relative to the transcription start site (TSS). In this paper, we address the problem of predicting transcription start sites in Escherichia coli. Knowing the TSS position, one can then predict the promoter position to within a few base pairs, and vice versa. The accepted method for promoter prediction is to use a pair of position weight matrices (PWMs), which define conserved motifs at the sigma-factor binding site. However this method is known to result in a large number of false positive predictions, thereby limiting its usefulness to the experimental biologist. We adopt an alternative approach based on the Support Vector Machine (SVM) using a modified mismatch spectrum kernel. Our modifications involve tagging the motifs with their location, and selectively pruning the feature set. We quantify the performance of several SVM models and a PWM model using a performance metric of area under the detection-error tradeoff (DET) curve. SVM models are shown to outperform the PWM on a biologically realistic TSS prediction task. We also describe a more broadly applicable peak scoring technique which reduces the number of false positive predictions, greatly enhancing the utility of our results.
Collapse
Affiliation(s)
- Michael W Towsey
- Faculty of Information Technology, Queensland University of Technology, GPO Box 2434, Brisbane, Queensland 4001, Australia.
| | | | | |
Collapse
|
33
|
Bodył A, Mackiewicz P. Analysis of the targeting sequences of an iron-containing superoxide dismutase (SOD) of the dinoflagellate Lingulodinium polyedrum suggests function in multiple cellular compartments. Arch Microbiol 2006; 187:281-96. [PMID: 17143625 DOI: 10.1007/s00203-006-0194-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2006] [Accepted: 11/06/2006] [Indexed: 01/19/2023]
Abstract
One of the proteins targeted to the peridinin plastid of the dinoflagellate Lingulodinium polyedrum is the iron-containing superoxide dismutase (LpSOD). Like dinoflagellate plastid proteins of class II, LpSOD carries a bipartite presequence comprising a signal peptide followed by a transit peptide. Our bioinformatic studies suggest that its signal peptide is atypical, however, and that the entire presequence may function as a mitochondrial targeting signal. It is possible that LpSOD represents a new class of proteins in algae with complex plastids, which are co-targeted to the plastid and mitochondrion. In addition to the ambiguous N-terminal targeting signal, LpSOD contains a potential type-1 peroxisome-targeting signal (PTS1) located at its C-terminus. In accordance with a peroxisome localization of this dismutase, its mRNA has two in-frame AUG codons. Our bioinformatic analyses indicate that the first start codon resides in a much weaker oligonucleotide context than the second one. This suggests that synthesis of the plastid/mitochondrion-targeted and peroxisome-targeted isoforms could proceed through so-called leaky scanning. Moreover, our results show that expression of the two isoforms could be regulated by a 'hairpin' structure located between the first and second start codons.
Collapse
Affiliation(s)
- Andrzej Bodył
- Department of Biodiversity and Evolutionary Taxonomy, Zoological Institute, University of Wrocław, ul. Przybyszewskiego 63/77, 51-148 Wrocław, Poland.
| | | |
Collapse
|
34
|
Prediction of Translation Initiation Sites Using Classifier Selection. ADVANCES IN ARTIFICIAL INTELLIGENCE 2006. [DOI: 10.1007/11752912_37] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
35
|
Chen L, Chen R, Nilufar S. Improving the performance of 1D object classification by using the Electoral College. Knowl Inf Syst 2005. [DOI: 10.1007/s10115-005-0232-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
36
|
Li GL, Leong TY. Feature selection for the prediction of translation initiation sites. GENOMICS, PROTEOMICS & BIOINFORMATICS 2005; 3:73-83. [PMID: 16393144 PMCID: PMC5172590 DOI: 10.1016/s1672-0229(05)03012-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Translation initiation sites (TISs) are important signals in cDNA sequences. In many previous attempts to predict TISs in cDNA sequences, three major factors affect the prediction performance: the nature of the cDNA sequence sets, the relevant features selected, and the classification methods used. In this paper, we examine different approaches to select and integrate relevant features for TIS prediction. The top selected significant features include the features from the position weight matrix and the propensity matrix, the number of nucleotide C in the sequence downstream ATG, the number of downstream stop codons, the number of upstream ATGs, and the number of some amino acids, such as amino acids A and D. With the numerical data generated from these features, different classification methods, including decision tree, naïve Bayes, and support vector machine, were applied to three independent sequence sets. The identified significant features were found to be biologically meaningful, while the experiments showed promising results.
Collapse
Affiliation(s)
- Guo Liang Li
- Medical Computing Laboratory, School of Computing, National University of Singapore.
| | | |
Collapse
|
37
|
Ramani AK, Bunescu RC, Mooney RJ, Marcotte EM. Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome. Genome Biol 2005; 6:R40. [PMID: 15892868 PMCID: PMC1175952 DOI: 10.1186/gb-2005-6-5-r40] [Citation(s) in RCA: 167] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2004] [Revised: 02/09/2005] [Accepted: 03/11/2005] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Extensive protein interaction maps are being constructed for yeast, worm, and fly to ask how the proteins organize into pathways and systems, but no such genome-wide interaction map yet exists for the set of human proteins. To prepare for studies in humans, we wished to establish tests for the accuracy of future interaction assays and to consolidate the known interactions among human proteins. RESULTS We established two tests of the accuracy of human protein interaction datasets and measured the relative accuracy of the available data. We then developed and applied natural language processing and literature-mining algorithms to recover from Medline abstracts 6,580 interactions among 3,737 human proteins. A three-part algorithm was used: first, human protein names were identified in Medline abstracts using a discriminator based on conditional random fields, then interactions were identified by the co-occurrence of protein names across the set of Medline abstracts, filtering the interactions with a Bayesian classifier to enrich for legitimate physical interactions. These mined interactions were combined with existing interaction data to obtain a network of 31,609 interactions among 7,748 human proteins, accurate to the same degree as the existing datasets. CONCLUSION These interactions and the accuracy benchmarks will aid interpretation of current functional genomics data and provide a basis for determining the quality of future large-scale human protein interaction assays. Projecting from the approximately 15 interactions per protein in the best-sampled interaction set to the estimated 25,000 human genes implies more than 375,000 interactions in the complete human protein interaction network. This set therefore represents no more than 10% of the complete network.
Collapse
Affiliation(s)
- Arun K Ramani
- Center for Systems and Synthetic Biology and Institute for Cellular and Molecular Biology, University of Texas, Austin, TX 78712, USA.
| | | | | | | |
Collapse
|
38
|
Rajapakse JC, Ho LS. Markov encoding for detecting signals in genomic sequences. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2005; 2:131-42. [PMID: 17044178 DOI: 10.1109/tcbb.2005.27] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
We present a technique to encode the inputs to neural networks for the detection of signals in genomic sequences. The encoding is based on lower-order Markov models which incorporate known biological characteristics in genomic sequences. The neural networks then learn intrinsic higher-order dependencies of nucleotides at the signal sites. We demonstrate the efficacy of the Markov encoding method in the detection of three genomic signals, namely, splice sites, transcription start sites, and translation initiation sites.
Collapse
Affiliation(s)
- Jagath C Rajapakse
- BioInformatics Research Center, School of Computer Engineering, Nanyang Technological University, Singapore 639798.
| | | |
Collapse
|
39
|
|
40
|
Papadimitriou S, Likothanassis SD. Kernel-based self-organized maps trained with supervised bias for gene expression data analysis. J Bioinform Comput Biol 2004; 1:647-80. [PMID: 15290758 DOI: 10.1142/s021972000400034x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2003] [Revised: 06/09/2003] [Accepted: 07/28/2003] [Indexed: 12/22/2022]
Abstract
Self-Organized Maps (SOMs) are a popular approach for analyzing genome-wide expression data. However, most SOM based approaches ignore prior knowledge about functional gene categories. Also, Self Organized Map (SOM) based approaches usually develop topographic maps with disjoint and uniform activation regions that correspond to a hard clustering of the patterns at their nodes. We present a novel Self-Organizing map, the Kernel Supervised Dynamic Grid Self-Organized Map (KSDG-SOM). This model adapts its parameters in a kernel space. Gaussian kernels are used and their mean and variance components are adapted in order to optimize the fitness to the input density. The KSDG-SOM also grows dynamically up to a size defined with statistical criteria. It is capable of incorporating a priori information for the known functional characteristics of genes. This information forms a supervised bias at the cluster formation and the model owns the potentiality of revising incorrect functional labels. The new method overcomes the main drawbacks of most of the existing clustering methods that lack a mechanism for dynamical extension on the basis of a balance between unsupervised and supervised drives.
Collapse
Affiliation(s)
- Stergios Papadimitriou
- Department of Information Management, Technological Education Institute of Kavala, 65404 Kavala, Greece.
| | | |
Collapse
|
41
|
Liu H, Han H, Li J, Wong L. DNAFSMiner: a web-based software toolbox to recognize two types of functional sites in DNA sequences. Bioinformatics 2004; 21:671-3. [PMID: 15284102 DOI: 10.1093/bioinformatics/bth437] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
UNLABELLED DNAFSMiner (DNA Functional Sites Miner) is a web-based software toolbox to recognize functional sites in nucleic acid sequences. Currently in this toolbox, we provide two software: TIS Miner and Poly(A) Signal Miner. The TIS Miner can be used to predict translation initiation sites in vertebrate DNA/mRNA/cDNA sequences, and the Poly(A) Signal Miner can be used to predict polyadenylation [poly(A)] signals in human DNA sequences. The prediction results are better than those by literature methods on two benchmark applications. This good performance is mainly attributable to our unique learning method. DNAFSMiner is available free of charge for academic and non-profit organizations. AVAILABILITY http://research.i2r.a-star.edu.sg/DNAFSMiner/ CONTACT huiqing@i2r.a-star.edu.sg.
Collapse
Affiliation(s)
- Huiqing Liu
- Institute for Infocomm Research 21 Heng Mui Keng Terrace, Singapore, 119613.
| | | | | | | |
Collapse
|