1
|
Zhai J, Wang W, Zhao R, Sun D, Lu D, Gong X. BDM: An Assessment Metric for Protein Complex Structure Models Based on Distance Difference Matrix. Interdiscip Sci 2024; 16:677-687. [PMID: 38536590 DOI: 10.1007/s12539-024-00622-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 02/07/2024] [Accepted: 02/17/2024] [Indexed: 09/19/2024]
Abstract
Protein complex structure prediction is an important problem in computational biology. While significant progress has been made for protein monomers, accurate evaluation of protein complexes remains challenging. Existing assessment methods in CASP, lack dedicated metrics for evaluating complexes. DockQ, a widely used metric, has some limitations. In this study, we propose a novel metric called BDM (Based on Distance difference Matrix) for assessing protein complex prediction structures. Our approach utilizes a distance difference matrix derived from comparing real and predicted protein structures, establishing a linear correlation with Root Mean Square Deviation (RMSD). BDM overcomes limitations associated with receptor-ligand differentiation and eliminates the requirement for structure alignment, making it a more effective and efficient metric. Evaluation of BDM using CASP14 and CASP15 test sets demonstrates superior performance compared to the official CASP scoring. BDM provides accurate and reasonable assessments of predicted protein complexes, wide adoption of BDM has the potential to advance protein complex structure prediction and facilitate related researches across scientific domains. Code is available at http://mialab.ruc.edu.cn/BDMServer/ .
Collapse
Affiliation(s)
- Jiaqi Zhai
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
| | - Wenda Wang
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
| | - Ranxi Zhao
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
| | - Daiwen Sun
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
| | - Da Lu
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
| | - Xinqi Gong
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China.
| |
Collapse
|
2
|
Sabsay KR, te Velthuis AJW. Using structure prediction of negative sense RNA virus nucleoproteins to assess evolutionary relationships. Virus Evol 2024; 10:veae058. [PMID: 39129834 PMCID: PMC11315766 DOI: 10.1093/ve/veae058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Revised: 05/21/2024] [Accepted: 07/19/2024] [Indexed: 08/13/2024] Open
Abstract
Negative sense RNA viruses (NSV) include some of the most detrimental human pathogens, including the influenza, Ebola, and measles viruses. NSV genomes consist of one or multiple single-stranded RNA molecules that are encapsidated into one or more ribonucleoprotein (RNP) complexes. These RNPs consist of viral RNA, a viral RNA polymerase, and many copies of the viral nucleoprotein (NP). Current evolutionary relationships within the NSV phylum are based on the alignment of conserved RNA-dependent RNA polymerase (RdRp) domain amino acid sequences. However, the RdRp domain-based phylogeny does not address whether NP, the other core protein in the NSV genome, evolved along the same trajectory or whether several RdRp-NP pairs evolved through convergent evolution in the segmented and non-segmented NSV genome architectures. Addressing how NP and the RdRp domain evolved may help us better understand NSV diversity. Since NP sequences are too short to infer robust phylogenetic relationships, we here used experimentally obtained and AlphaFold 2.0-predicted NP structures to probe whether evolutionary relationships can be estimated using NSV NP sequences. Following flexible structure alignments of modeled structures, we find that the structural homology of the NSV NPs reveals phylogenetic clusters that are consistent with RdRp-based clustering. In addition, we were able to assign viruses for which RdRp sequences are currently missing to phylogenetic clusters based on the available NP sequence. Both our RdRp-based and NP-based relationships deviate from the current NSV classification of the segmented Naedrevirales, which cluster with the other segmented NSVs in our analysis. Overall, our results suggest that the NSV RdRp and NP genes largely evolved along similar trajectories and even short pieces of genetic, protein-coding information can be used to infer evolutionary relationships, potentially making metagenomic analyses more valuable.
Collapse
Affiliation(s)
- Kimberly R Sabsay
- Lewis Thomas Laboratory, Department of Molecular Biology, Princeton University, Washington Road, Princeton, NJ 08544, United States
- Lewis Sigler Institute, Princeton University, Washington Road, Princeton, NJ 08544, United States
| | - Aartjan J W te Velthuis
- Lewis Thomas Laboratory, Department of Molecular Biology, Princeton University, Washington Road, Princeton, NJ 08544, United States
| |
Collapse
|
3
|
Sabsay KR, te Velthuis AJ. Using structure prediction of negative sense RNA virus nucleoproteins to assess evolutionary relationships. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.16.580771. [PMID: 38405982 PMCID: PMC10888975 DOI: 10.1101/2024.02.16.580771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Negative sense RNA viruses (NSV) include some of the most detrimental human pathogens, including the influenza, Ebola and measles viruses. NSV genomes consist of one or multiple single-stranded RNA molecules that are encapsidated into one or more ribonucleoprotein (RNP) complexes. These RNPs consist of viral RNA, a viral RNA polymerase, and many copies of the viral nucleoprotein (NP). Current evolutionary relationships within the NSV phylum are based on alignment of conserved RNA-directed RNA polymerase (RdRp) domain amino acid sequences. However, the RdRp domain-based phylogeny does not address whether NP, the other core protein in the NSV genome, evolved along the same trajectory or whether several RdRp-NP pairs evolved through convergent evolution in the segmented and non-segmented NSV genomes architectures. Addressing how NP and the RdRp domain evolved may help us better understand NSV diversity. Since NP sequences are too short to infer robust phylogenetic relationships, we here used experimentally-obtained and AlphaFold 2.0-predicted NP structures to probe whether evolutionary relationships can be estimated using NSV NP sequences. Following flexible structure alignments of modeled structures, we find that the structural homology of the NSV NPs reveals phylogenetic clusters that are consistent with RdRp-based clustering. In addition, we were able to assign viruses for which RdRp sequences are currently missing to phylogenetic clusters based on the available NP sequence. Both our RdRp-based and NP-based relationships deviate from the current NSV classification of the segmented Naedrevirales, which cluster with the other segmented NSVs in our analysis. Overall, our results suggest that the NSV RdRp and NP genes largely evolved along similar trajectories and that even short pieces of genetic, protein-coding information can be used to infer evolutionary relationships, potentially making metagenomic analyses more valuable.
Collapse
Affiliation(s)
- Kimberly R. Sabsay
- Lewis Thomas Laboratory, Department of Molecular Biology, Princeton University, Princeton, NJ 08544, United States
- Lewis Sigler Institute, Princeton University, Princeton, NJ 08544, United States
| | - Aartjan J.W. te Velthuis
- Lewis Thomas Laboratory, Department of Molecular Biology, Princeton University, Princeton, NJ 08544, United States
| |
Collapse
|
4
|
López Sánchez A, Lafond M. Predicting horizontal gene transfers with perfect transfer networks. Algorithms Mol Biol 2024; 19:6. [PMID: 38321476 PMCID: PMC10848447 DOI: 10.1186/s13015-023-00242-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 10/25/2023] [Indexed: 02/08/2024] Open
Abstract
BACKGROUND Horizontal gene transfer inference approaches are usually based on gene sequences: parametric methods search for patterns that deviate from a particular genomic signature, while phylogenetic methods use sequences to reconstruct the gene and species trees. However, it is well-known that sequences have difficulty identifying ancient transfers since mutations have enough time to erase all evidence of such events. In this work, we ask whether character-based methods can predict gene transfers. Their advantage over sequences is that homologous genes can have low DNA similarity, but still have retained enough important common motifs that allow them to have common character traits, for instance the same functional or expression profile. A phylogeny that has two separate clades that acquired the same character independently might indicate the presence of a transfer even in the absence of sequence similarity. OUR CONTRIBUTIONS We introduce perfect transfer networks, which are phylogenetic networks that can explain the character diversity of a set of taxa under the assumption that characters have unique births, and that once a character is gained it is rarely lost. Examples of such traits include transposable elements, biochemical markers and emergence of organelles, just to name a few. We study the differences between our model and two similar models: perfect phylogenetic networks and ancestral recombination networks. Our goals are to initiate a study on the structural and algorithmic properties of perfect transfer networks. We then show that in polynomial time, one can decide whether a given network is a valid explanation for a set of taxa, and show how, for a given tree, one can add transfer edges to it so that it explains a set of taxa. We finally provide lower and upper bounds on the number of transfers required to explain a set of taxa, in the worst case.
Collapse
Affiliation(s)
| | - Manuel Lafond
- Department of Computer Science, Université de Sherbrooke, Sherbrooke, Canada
| |
Collapse
|
5
|
Ayipo YO, Ahmad I, Alananzeh W, Lawal A, Patel H, Mordi MN. Computational modelling of potential Zn-sensitive non-β-lactam inhibitors of imipenemase-1 (IMP-1). J Biomol Struct Dyn 2023; 41:10096-10116. [PMID: 36476097 DOI: 10.1080/07391102.2022.2153168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 11/24/2022] [Indexed: 12/13/2022]
Abstract
Antibiotic resistance (AR) remains one of the leading global health challenges, mostly implicated in disease-related deaths. The Enterobacteriaceae-producing metallo-β-lactamases (MBLs) are critically involved in AR pathogenesis through Zn-dependent catalytic destruction of β-lactam antibiotics, yet with limited successful clinical inhibitors. The efficacy of relevant broad-spectrum β-lactams including imipenem and meropenem are seriously challenged by their susceptibility to the Zn-dependent carbapenemase hydrolysis, as such, searching for alternatives remains imperative. In this study, computational molecular modelling and virtual screening methods were extensively applied to identify new putative Zn-sensitive broad-spectrum inhibitors of MBLs, specifically imipenemase-1 (IMP-1) from the IBScreen database. Three ligands, STOCK3S-30154, STOCK3S-30418 and STOCK3S-30514 selectively displayed stronger binding interactions with the enzymes compared to reference inhibitors, imipenem and meropenem. For instance, the ligands showed molecular docking scores of -9.450, -8.005 and -10.159 kcal/mol, and MM-GBSA values of -40.404, -31.902 and -33.680 kcal/mol respectively against the IMP-1. Whereas, imipenem and meropenem showed docking scores of -9.038 and -10.875 kcal/mol, and MM-GBSA of -31.184 and -32.330 kcal/mol respectively against the enzyme. The ligands demonstrated good thermodynamic stability and compactness in complexes with IMP-1 throughout the 100 ns molecular dynamics (MD) trajectories. Interestingly, their binding affinities and stabilities were significantly affected in contacts with the remodelled Zn-deficient IMP-1, indicating sensitivity to the carbapenemase active Zn site, however, with non-β-lactam scaffolds, tenable to resist catalytic hydrolysis. They displayed ideal drug-like ADMET properties, thus, representing putative Zn-sensitive non-β-lactam inhibitors of IMP-1 amenable for further experimental studies.
Collapse
Affiliation(s)
- Yusuf Oloruntoyin Ayipo
- Centre for Drug Research, Universiti Sains Malaysia, USM, Pulau Pinang, Malaysia
- Department of Chemistry and Industrial Chemistry, Kwara State University, Ilorin, Nigeria
| | - Iqrar Ahmad
- Department of Pharmaceutical Chemistry, R. C. Patel Institute of Pharmaceutical Education and Research, Shirpur, Maharashtra, India
| | - Waleed Alananzeh
- Centre for Drug Research, Universiti Sains Malaysia, USM, Pulau Pinang, Malaysia
| | - Amudat Lawal
- Department of Chemistry, University of Ilorin, Ilorin, Nigeria
| | - Harun Patel
- Department of Pharmaceutical Chemistry, R. C. Patel Institute of Pharmaceutical Education and Research, Shirpur, Maharashtra, India
| | - Mohd Nizam Mordi
- Centre for Drug Research, Universiti Sains Malaysia, USM, Pulau Pinang, Malaysia
| |
Collapse
|
6
|
Stadler PF, Will S. Bi-alignments with affine gaps costs. Algorithms Mol Biol 2022; 17:10. [PMID: 35578255 PMCID: PMC9109335 DOI: 10.1186/s13015-022-00219-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 05/01/2022] [Indexed: 12/02/2022] Open
Abstract
Background Commonly, sequence and structure elements are assumed to evolve congruently, such that homologous sequence positions correspond to homologous structural features. Assuming congruent evolution, alignments based on sequence and structure similarity can therefore optimize both similarities at the same time in a single alignment. To model incongruent evolution, where sequence and structural features diverge positionally, we recently introduced bi-alignments. This generalization of sequence and structure-based alignments is best understood as alignments of two distinct pairwise alignments of the same entities: one modeling sequence similarity, the other structural similarity. Results Optimal bi-alignments with affine gap costs (or affine shift cost) for two constituent alignments can be computed exactly in quartic space and time. Even bi-alignments with affine shift and gap cost, as well as bi-alignment with sub-additive gap cost are optimized efficiently. Affine gap-cost bi-alignment of large proteins (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\sim 930$$\end{document}∼930 aa) can be computed. Conclusion Affine cost bi-alignments are of practical interest to study shifts of protein sequences and protein structures relative to each other. Availability The affine cost bi-alignment algorithm has been implemented in Python 3 and Cython. It is available as free software from https://github.com/s-will/BiAlign/releases/tag/v0.3 and as bioconda package bialign. Supplementary Information The online version contains supplementary material available at 10.1186/s13015-022-00219-7.
Collapse
|
7
|
Li D, Zhang L. Structure Prediction and Potential Inhibitors Docking of Enterovirus 2C Proteins. Front Microbiol 2022; 13:856574. [PMID: 35572704 PMCID: PMC9100428 DOI: 10.3389/fmicb.2022.856574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 03/31/2022] [Indexed: 11/18/2022] Open
Abstract
Human enterovirus infections are mostly asymptomatic and occasionally could be severe and life-threatening. The conserved non-structural 2C from enteroviruses protein is a promising target in antiviral therapies against human enteroviruses. Understanding of 2C-drug interactions is crucial for developing the potential antiviral agents. While functions of enterovirus 2C proteins have been widely studied, three-dimensional structure information of 2C is limited. In this study, the structures of 2C proteins from 20 enteroviruses were simulated and reconstructed using I-TASSER programs. Subsequent docking studies of the known 22 antiviral inhibitors for 2C proteins were performed to uncover the inhibitor-binding characteristics of 2C. Among the potential inhibitors, the compound hydantoin exhibited the highest broad-spectrum antiviral activities with binding to 2C protein. The anti-enteroviral activity of GuaHCL, compound 19b, R523062, compound 12a, compound 12b, quinoline analogs 12a, compound 19d, N6-benzyladenosine, dibucaine derivatives 6i, TBZE-029, fluoxetine analogs 2b, dibucaine, 2-(α-hydroxybenzyl)-benzimidazole (HBB), metrifudil, pirlindole, MRL-1237, quinoline analogs 10a, zuclopenthixol, fluoxetine, fluoxetine HCl, and quinoline analogs 12c showed a trend of gradual decrease. In addition, the free energy with 22 compounds binding to EV 2C ranged from −0.35 to −88.18 kcal/mol. Our in silico studies will provide important information for the development of pan-enterovirus antiviral agents based on 2C.
Collapse
Affiliation(s)
- Daoqun Li
- Department of Clinical Laboratory Medicine, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China
- Department of Pathogen Biology, School of Basic Medical Sciences, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, China
- Medical Science and Technology Innovation Center, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, China
| | - Leiliang Zhang
- Department of Clinical Laboratory Medicine, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China
- Department of Pathogen Biology, School of Basic Medical Sciences, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, China
- Medical Science and Technology Innovation Center, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, China
- *Correspondence: Leiliang Zhang
| |
Collapse
|
8
|
Fukutani T, Miyazawa K, Iwata S, Satoh H. G-RMSD: Root Mean Square Deviation Based Method for Three-Dimensional Molecular Similarity Determination. BULLETIN OF THE CHEMICAL SOCIETY OF JAPAN 2021. [DOI: 10.1246/bcsj.20200258] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Affiliation(s)
- Tomonori Fukutani
- Department of Mathematical Informatics, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8654, Japan
| | - Kohei Miyazawa
- Department of Mathematical Informatics, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8654, Japan
| | - Satoru Iwata
- Department of Mathematical Informatics, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8654, Japan
| | - Hiroko Satoh
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
- Research Organization of Information and Systems (ROIS), 4-3-13 Toranomon, Minato-ku, Tokyo 105-0001, Japan
| |
Collapse
|
9
|
Minkin I, Pham S, Medvedev P. TwoPaCo: an efficient algorithm to build the compacted de Bruijn graph from many complete genomes. Bioinformatics 2018; 33:4024-4032. [PMID: 27659452 DOI: 10.1093/bioinformatics/btw609] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2016] [Accepted: 09/16/2016] [Indexed: 01/06/2023] Open
Abstract
Motivation de Bruijn graphs have been proposed as a data structure to facilitate the analysis of related whole genome sequences, in both a population and comparative genomic settings. However, current approaches do not scale well to many genomes of large size (such as mammalian genomes). Results In this article, we present TwoPaCo, a simple and scalable low memory algorithm for the direct construction of the compacted de Bruijn graph from a set of complete genomes. We demonstrate that it can construct the graph for 100 simulated human genomes in less than a day and eight real primates in < 2 h, on a typical shared-memory machine. We believe that this progress will enable novel biological analyses of hundreds of mammalian-sized genomes. Availability and Implementation Our code and data is available for download from github.com/medvedevgroup/TwoPaCo. Contact ium125@psu.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ilia Minkin
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| | - Son Pham
- BioTuring Inc., San Diego, CA 92121, USA
| | - Paul Medvedev
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA.,Department of Biochemistry and Molecular Biology.,Genomic Sciences Institute of the Huck, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
10
|
Kim C, Yang J, Jeong SH, Kim H, Park GH, Shin HB, Ro M, Kim KY, Park Y, Kim KP, Kwack K. Yeast-based assays for characterization of the functional effects of single nucleotide polymorphisms in human DNA repair genes. PLoS One 2018; 13:e0193823. [PMID: 29522548 PMCID: PMC5844570 DOI: 10.1371/journal.pone.0193823] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 02/20/2018] [Indexed: 01/03/2023] Open
Abstract
DNA repair mechanisms maintain genomic integrity upon exposure to various types of DNA damage, which cause either single- or double-strand breaks in the DNA. Here, we propose a strategy for the functional study of single nucleotide polymorphisms (SNPs) in the human DNA repair genes XPD/ERCC2, RAD18, and KU70/XRCC6 and the checkpoint activation gene ATR that are essentially involved in the cell cycle and DNA damage repair. We analyzed the mutational effects of the DNA repair genes under DNA-damaging conditions, including ultraviolet irradiation and treatment with genotoxic reagents, using a Saccharomyces cerevisiae system to overcome the limitations of the human cell-based assay. We identified causal variants from selected SNPs in the present analyses. (i) R594C SNP in RAD3 (human XPD/ERCC2) caused severe reductions in the growth rate of mutant cells upon short-wavelength UV irradiation or chemical reagent treatment. (ii) The growth rates of the selected variants in RAD18, YKU70, and MEC1 were similar to those of wild-type cells on methyl methanesulfonate and hydroxyurea treated media. (iii) We also assessed the structural impact of the SNPs by analyzing differences in the structural conformation and calculating the root mean square deviation, which is a measure of the discordance of the Cα atoms between protein structures. Based on the above results, we propose that these analytical approaches serve as efficient methods for the identification of causal variants of human disease-causing genes and elucidation of yeast-cell based molecular mechanisms.
Collapse
Affiliation(s)
- Changshin Kim
- Department of Biomedical Science, College of Life Science, CHA University, Seongnam-si, Gyeonggi-do, Republic of Korea
| | - Jinmo Yang
- Department of Biomedical Science, College of Life Science, CHA University, Seongnam-si, Gyeonggi-do, Republic of Korea
| | - Su-Hyun Jeong
- Department of Biomedical Science, College of Life Science, CHA University, Seongnam-si, Gyeonggi-do, Republic of Korea
| | - Hayoung Kim
- Department of Biomedical Science, College of Life Science, CHA University, Seongnam-si, Gyeonggi-do, Republic of Korea
| | - Geun-hee Park
- Department of Biomedical Science, College of Life Science, CHA University, Seongnam-si, Gyeonggi-do, Republic of Korea
| | - Hwa Beom Shin
- Department of Biomedical Science, College of Life Science, CHA University, Seongnam-si, Gyeonggi-do, Republic of Korea
| | - MyungJa Ro
- Department of Biomedical Science, College of Life Science, CHA University, Seongnam-si, Gyeonggi-do, Republic of Korea
| | - Kyoung-Yeon Kim
- Department of Biomedical Science, College of Life Science, CHA University, Seongnam-si, Gyeonggi-do, Republic of Korea
| | - YoungJoon Park
- Department of Biomedical Science, College of Life Science, CHA University, Seongnam-si, Gyeonggi-do, Republic of Korea
| | - Keun Pil Kim
- Department of Life Sciences, College of Natural Sciences, Chung-Ang University, Seoul, Republic of Korea
| | - KyuBum Kwack
- Department of Biomedical Science, College of Life Science, CHA University, Seongnam-si, Gyeonggi-do, Republic of Korea
| |
Collapse
|
11
|
Haghshenas E, Hach F, Sahinalp SC, Chauve C. CoLoRMap: Correcting Long Reads by Mapping short reads. Bioinformatics 2017; 32:i545-i551. [PMID: 27587673 DOI: 10.1093/bioinformatics/btw463] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
MOTIVATION Second generation sequencing technologies paved the way to an exceptional increase in the number of sequenced genomes, both prokaryotic and eukaryotic. However, short reads are difficult to assemble and often lead to highly fragmented assemblies. The recent developments in long reads sequencing methods offer a promising way to address this issue. However, so far long reads are characterized by a high error rate, and assembling from long reads require a high depth of coverage. This motivates the development of hybrid approaches that leverage the high quality of short reads to correct errors in long reads. RESULTS We introduce CoLoRMap, a hybrid method for correcting noisy long reads, such as the ones produced by PacBio sequencing technology, using high-quality Illumina paired-end reads mapped onto the long reads. Our algorithm is based on two novel ideas: using a classical shortest path algorithm to find a sequence of overlapping short reads that minimizes the edit score to a long read and extending corrected regions by local assembly of unmapped mates of mapped short reads. Our results on bacterial, fungal and insect data sets show that CoLoRMap compares well with existing hybrid correction methods. AVAILABILITY AND IMPLEMENTATION The source code of CoLoRMap is freely available for non-commercial use at https://github.com/sfu-compbio/colormap CONTACT ehaghshe@sfu.ca or cedric.chauve@sfu.ca SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ehsan Haghshenas
- School of Computing Sciences MADD-Gen Graduate Program, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Faraz Hach
- School of Computing Sciences Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada
| | - S Cenk Sahinalp
- School of Computing Sciences Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada, School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| |
Collapse
|
12
|
Jackman SD, Vandervalk BP, Mohamadi H, Chu J, Yeo S, Hammond SA, Jahesh G, Khan H, Coombe L, Warren RL, Birol I. ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter. Genome Res 2017; 27:768-777. [PMID: 28232478 PMCID: PMC5411771 DOI: 10.1101/gr.214346.116] [Citation(s) in RCA: 381] [Impact Index Per Article: 54.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2016] [Accepted: 02/14/2017] [Indexed: 01/19/2023]
Abstract
The assembly of DNA sequences de novo is fundamental to genomics research. It is the first of many steps toward elucidating and characterizing whole genomes. Downstream applications, including analysis of genomic variation between species, between or within individuals critically depend on robustly assembled sequences. In the span of a single decade, the sequence throughput of leading DNA sequencing instruments has increased drastically, and coupled with established and planned large-scale, personalized medicine initiatives to sequence genomes in the thousands and even millions, the development of efficient, scalable and accurate bioinformatics tools for producing high-quality reference draft genomes is timely. With ABySS 1.0, we originally showed that assembling the human genome using short 50-bp sequencing reads was possible by aggregating the half terabyte of compute memory needed over several computers using a standardized message-passing system (MPI). We present here its redesign, which departs from MPI and instead implements algorithms that employ a Bloom filter, a probabilistic data structure, to represent a de Bruijn graph and reduce memory requirements. We benchmarked ABySS 2.0 human genome assembly using a Genome in a Bottle data set of 250-bp Illumina paired-end and 6-kbp mate-pair libraries from a single individual. Our assembly yielded a NG50 (NGA50) scaffold contiguity of 3.5 (3.0) Mbp using <35 GB of RAM. This is a modest memory requirement by today's standards and is often available on a single computer. We also investigate the use of BioNano Genomics and 10x Genomics' Chromium data to further improve the scaffold NG50 (NGA50) of this assembly to 42 (15) Mbp.
Collapse
Affiliation(s)
- Shaun D Jackman
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Benjamin P Vandervalk
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Hamid Mohamadi
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Justin Chu
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Sarah Yeo
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada
| | - S Austin Hammond
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Golnaz Jahesh
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Hamza Khan
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Lauren Coombe
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Rene L Warren
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada
| |
Collapse
|
13
|
Srivastava S, Lal SB, Mishra DC, Angadi UB, Chaturvedi KK, Rai SN, Rai A. An efficient algorithm for protein structure comparison using elastic shape analysis. Algorithms Mol Biol 2016; 11:27. [PMID: 27708689 PMCID: PMC5041553 DOI: 10.1186/s13015-016-0089-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2016] [Accepted: 09/21/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein structure comparison play important role in in silico functional prediction of a new protein. It is also used for understanding the evolutionary relationships among proteins. A variety of methods have been proposed in literature for comparing protein structures but they have their own limitations in terms of accuracy and complexity with respect to computational time and space. There is a need to improve the computational complexity in comparison/alignment of proteins through incorporation of important biological and structural properties in the existing techniques. RESULTS An efficient algorithm has been developed for comparing protein structures using elastic shape analysis in which the sequence of 3D coordinates atoms of protein structures supplemented by additional auxiliary information from side-chain properties are incorporated. The protein structure is represented by a special function called square-root velocity function. Furthermore, singular value decomposition and dynamic programming have been employed for optimal rotation and optimal matching of the proteins, respectively. Also, geodesic distance has been calculated and used as the dissimilarity score between two protein structures. The performance of the developed algorithm is tested and found to be more efficient, i.e., running time reduced by 80-90 % without compromising accuracy of comparison when compared with the existing methods. Source codes for different functions have been developed in R. Also, user friendly web-based application called ProtSComp has been developed using above algorithm for comparing protein 3D structures and is accessible free. CONCLUSIONS The methodology and algorithm developed in this study is taking considerably less computational time without loss of accuracy (Table 2). The proposed algorithm is considering different criteria of representing protein structures using 3D coordinates of atoms and inclusion of residue wise molecular properties as auxiliary information.
Collapse
|
14
|
Gupta A, Mir SS, Saqib U, Biswas S, Vaishya S, Srivastava K, Siddiqi MI, Habib S. The effect of fusidic acid on Plasmodium falciparum elongation factor G (EF-G). Mol Biochem Parasitol 2013; 192:39-48. [PMID: 24211494 DOI: 10.1016/j.molbiopara.2013.10.003] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2013] [Revised: 10/29/2013] [Accepted: 10/29/2013] [Indexed: 11/30/2022]
Abstract
Inhibition of growth of the malaria parasite Plasmodium falciparum by known translation-inhibitory antibiotics has generated interest in understanding their action on the translation apparatus of the two genome containing organelles of the malaria parasite: the mitochondrion and the relic plastid (apicoplast). We report GTPase activity of recombinant EF-G proteins that are targeted to the organelles and further use these to test the effect of the EF-G inhibitor fusidic acid (FA) on the factor-ribosome interface. Our results monitoring locking of EF-G·GDP onto surrogate Escherichia coli ribosomes as well as multi-turnover GTP hydrolysis by the factor indicate that FA has a greater effect on apicoplast EF-G compared to the mitochondrial counterpart. Deletion of a three amino acid (GVG) sequence in the switch I loop that is conserved in proteins of the mitochondrial EF-G1 family and the Plasmodium mitochondrial factor, but is absent in apicoplast EF-G, demonstrated that this motif contributes to differential inhibition of the two EF-Gs by FA. Additionally, the drug thiostrepton, that is known to target the apicoplast and proteasome, enhanced retention of only mitochondrial EF-G on ribosomes providing support for the reported effect of the drug on parasite mitochondrial translation.
Collapse
Affiliation(s)
- Ankit Gupta
- Division of Molecular and Structural Biology, CSIR-Central Drug Research Institute, Lucknow, India
| | - Snober S Mir
- Division of Molecular and Structural Biology, CSIR-Central Drug Research Institute, Lucknow, India
| | - Uzma Saqib
- Division of Molecular and Structural Biology, CSIR-Central Drug Research Institute, Lucknow, India
| | - Subir Biswas
- Division of Molecular and Structural Biology, CSIR-Central Drug Research Institute, Lucknow, India
| | - Suniti Vaishya
- Division of Molecular and Structural Biology, CSIR-Central Drug Research Institute, Lucknow, India
| | - Kumkum Srivastava
- Division of Parasitology, CSIR-Central Drug Research Institute, Lucknow, India
| | - Mohammad Imran Siddiqi
- Division of Molecular and Structural Biology, CSIR-Central Drug Research Institute, Lucknow, India
| | - Saman Habib
- Division of Molecular and Structural Biology, CSIR-Central Drug Research Institute, Lucknow, India.
| |
Collapse
|