Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Thompson JD, Linard B, Lecompte O, Poch O. A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PLoS One 2011;6:e18093. [PMID: 21483869 PMCID: PMC3069049 DOI: 10.1371/journal.pone.0018093] [Citation(s) in RCA: 129] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2010] [Accepted: 02/21/2011] [Indexed: 12/18/2022] Open

For:	Thompson JD, Linard B, Lecompte O, Poch O. A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PLoS One 2011;6:e18093. [PMID: 21483869 PMCID: PMC3069049 DOI: 10.1371/journal.pone.0018093] [Citation(s) in RCA: 129] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2010] [Accepted: 02/21/2011] [Indexed: 12/18/2022] Open

Number

Cited by Other Article(s)

Chong LC, Khan AM. A Systematic Bioinformatics Approach for Mapping the Minimal Set of a Viral Peptidome. Curr Protoc 2024;4:e1056. [PMID: 38856995 DOI: 10.1002/cpz1.1056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]

Abstract

Sequence changes in viral genomes generate protein sequence diversity that enables viruses to evade the host immune system, hindering the development of effective preventive and therapeutic interventions. The massive proliferation of sequence data provides unprecedented opportunities to study viral adaptation and evolution. An alignment-free approach removes various restrictions posed by an alignment-dependent approach for studying sequence diversity. The publicly available tool, UNIQmin, offers an alignment-free approach for studying viral sequence diversity at any given rank of taxonomy lineage and is big data ready. The tool performs an exhaustive search to determine the minimal set of sequences required to capture the peptidome diversity within a given dataset. This compression is possible through the removal of identical sequences and unique sequences that do not contribute effectively to the peptidome diversity pool. Herein, we describe a detailed four-part protocol utilizing UNIQmin to generate the minimal set for the purpose of viral diversity analyses, alignment-free at any rank of the taxonomy lineage, using the recent global public health threat Monkeypox virus (MPX) sequence data as a case study. The protocol enables a systematic bioinformatics approach to study sequence diversity across taxonomic lineages, which is crucial for our future preparedness against viral epidemics. This is particularly important when data are abundant, freely available, and alignment is not an option. © 2024 Wiley Periodicals LLC. Basic Protocol 1: Tool installation and input file preparation Basic Protocol 2: Generation of a minimal set of sequences for a given dataset Basic Protocol 3: Comparative minimal set analysis across taxonomic lineage ranks Basic Protocol 4: Factors affecting the minimal set of sequences.

Collapse

Saha G, Sawmya S, Saha A, Akil MA, Tasnim S, Rahman MS, Rahman MS. PRIEST: predicting viral mutations with immune escape capability of SARS-CoV-2 using temporal evolutionary information. Brief Bioinform 2024;25:bbae218. [PMID: 38742520 PMCID: PMC11091746 DOI: 10.1093/bib/bbae218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 04/04/2024] [Accepted: 04/06/2024] [Indexed: 05/16/2024] Open

Navhaya LT, Blessing DM, Yamkela M, Godlo S, Makhoba XH. A comprehensive review of the interaction between COVID-19 spike proteins with mammalian small and major heat shock proteins. Biomol Concepts 2024;15:bmc-2022-0027. [PMID: 38872399 DOI: 10.1515/bmc-2022-0027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 02/13/2023] [Indexed: 06/15/2024] Open

Chakraborty A, Hussain A, Sabnam N. Uncovering the structural stability of Magnaporthe oryzae effectors: a secretome-wide in silico analysis. J Biomol Struct Dyn 2023:1-22. [PMID: 38109060 DOI: 10.1080/07391102.2023.2292795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Accepted: 11/23/2023] [Indexed: 12/19/2023]

Fruzangohar M, Moolhuijzen P, Bakaj N, Taylor J. CoreDetector: a flexible and efficient program for core-genome alignment of evolutionary diverse genomes. Bioinformatics 2023;39:btad628. [PMID: 37878789 PMCID: PMC10663985 DOI: 10.1093/bioinformatics/btad628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 09/20/2023] [Accepted: 10/23/2023] [Indexed: 10/27/2023] Open

João M, Sena AC, Rebello VEF. On closing the inopportune gap with consistency transformation and iterative refinement. PLoS One 2023;18:e0287483. [PMID: 37440507 PMCID: PMC10343097 DOI: 10.1371/journal.pone.0287483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 06/06/2023] [Indexed: 07/15/2023] Open

Khodji H, Collet P, Thompson JD, Jeannin-Girardon A. De-MISTED: Image-based classification of erroneous multiple sequence alignments using convolutional neural networks. APPL INTELL 2023. [DOI: 10.1007/s10489-022-04390-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]

Genome structure-based Juglandaceae phylogenies contradict alignment-based phylogenies and substitution rates vary with DNA repair genes. Nat Commun 2023;14:617. [PMID: 36739280 PMCID: PMC9899254 DOI: 10.1038/s41467-023-36247-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Accepted: 01/20/2023] [Indexed: 02/06/2023] Open

Hu Y, Buehler MJ. End-to-End Protein Normal Mode Frequency Predictions Using Language and Graph Models and Application to Sonification. ACS NANO 2022;16:20656-20670. [PMID: 36416536 DOI: 10.1021/acsnano.2c07681] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]

Abstract

The prediction of mechanical and dynamical properties of proteins is an important frontier, especially given the greater availability of proteins structures. Here we report a series of models that provide end-to-end predictions of nanodynamical properties of proteins, focused on high-throughput normal mode predictions directly from the amino acid sequence. Using neural network models within the family of Natural Language Processing and graph-based methods, we offer atomistically based mechanistic predictions of key protein mechanical features. The models include an end-to-end long short-term memory (LSTM) model, an end-to-end transformer model, a graph-based transformer model, and an equivariant graph neural network. All four models show exceptional performance, with the graph-based transformer architecture offering the best results but at the cost of requiring a graph structure as input. Conversely, the LSTM and transformer models offer end-to-end sequence-to-property prediction capabilities, providing efficient avenues for protein engineering, analysis, and design. We compare our results against published data based on a Principal Neighborhood Aggregation graph neural network, revealing that the transformer model offers better performance while also being able to predict a large set of the first 64 normal mode frequencies, simultaneously. The use of the end-to-end transformer model may facilitate other downstream applications through the use of transfer learning, and it offers a comprehensive prediction of dynamical properties without any structural knowledge, directly from the amino acid sequence. We demonstrate a potential application in scientific sonification, where the normal mode frequencies are transposed to generate audible signals for a detailed analysis of subtle changes of protein sequences.

Collapse

Rosignoli S, Paiardini A. Boosting the Full Potential of PyMOL with Structural Biology Plugins. Biomolecules 2022;12:biom12121764. [PMID: 36551192 PMCID: PMC9775141 DOI: 10.3390/biom12121764] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 11/23/2022] [Accepted: 11/24/2022] [Indexed: 11/29/2022] Open

Wu L, Yin C, Zhu J, Wu Z, He L, Xia Y, Xie S, Qin T, Liu TY. SPRoBERTa: protein embedding learning with local fragment modeling. Brief Bioinform 2022;23:6711410. [PMID: 36136367 DOI: 10.1093/bib/bbac401] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 07/18/2022] [Accepted: 08/18/2022] [Indexed: 12/14/2022] Open

Hubley R, Wheeler TJ, Smit AFA. Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families. NAR Genom Bioinform 2022;4:lqac040. [PMID: 35591887 PMCID: PMC9112768 DOI: 10.1093/nargab/lqac040] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 03/29/2022] [Accepted: 04/29/2022] [Indexed: 02/06/2023] Open

Nayeem MA, Bayzid MS, Rahman AH, Shahriyar R, Rahman MS. Multiobjective Formulation of Multiple Sequence Alignment for Phylogeny Inference. IEEE TRANSACTIONS ON CYBERNETICS 2022;52:2775-2786. [PMID: 33044939 DOI: 10.1109/tcyb.2020.3020308] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Kostenko DO, Korotkov EV. Application of the MAHDS Method for Multiple Alignment of Highly Diverged Amino Acid Sequences. Int J Mol Sci 2022;23:ijms23073764. [PMID: 35409125 PMCID: PMC8998981 DOI: 10.3390/ijms23073764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 03/23/2022] [Accepted: 03/23/2022] [Indexed: 12/10/2022] Open

Petrov PB, Awoniyi LO, Šuštar V, Balci MÖ, Mattila PK. AutoCoEv—A High-Throughput In Silico Pipeline for Predicting Inter-Protein Coevolution. Int J Mol Sci 2022;23:ijms23063351. [PMID: 35328772 PMCID: PMC8952222 DOI: 10.3390/ijms23063351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 03/15/2022] [Accepted: 03/17/2022] [Indexed: 11/16/2022] Open

Alpert A, Nahman O, Starosvetsky E, Hayun M, Curiel TJ, Ofran Y, Shen-Orr SS. Alignment of single-cell trajectories by tuMap enables high-resolution quantitative comparison of cancer samples. Cell Syst 2022;13:71-82.e8. [PMID: 34624253 PMCID: PMC8776581 DOI: 10.1016/j.cels.2021.09.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 06/20/2021] [Accepted: 09/09/2021] [Indexed: 01/21/2023]

Biological sequence analysis. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00003-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

De Luca D, Lauritano C. Transcriptome Mining to Identify Genes of Interest: From Local Databases to Phylogenetic Inference. Methods Mol Biol 2022;2498:43-51. [PMID: 35727539 DOI: 10.1007/978-1-0716-2313-8_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]

Spielman SJ, Miraglia ML. Relative model selection of evolutionary substitution models can be sensitive to multiple sequence alignment uncertainty. BMC Ecol Evol 2021;21:214. [PMID: 34844571 PMCID: PMC8628390 DOI: 10.1186/s12862-021-01931-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Accepted: 10/06/2021] [Indexed: 11/10/2022] Open

Generator based approach to analyze mutations in genomic datasets. Sci Rep 2021;11:21084. [PMID: 34702945 PMCID: PMC8548350 DOI: 10.1038/s41598-021-00609-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 10/13/2021] [Indexed: 11/09/2022] Open

Wang Y, Zhao Y, Pan Q. Advances, challenges and opportunities of phylogenetic and social network analysis using COVID-19 data. Brief Bioinform 2021;23:6380452. [PMID: 34601563 DOI: 10.1093/bib/bbab406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 08/04/2021] [Accepted: 09/03/2021] [Indexed: 11/15/2022] Open

Li Y. Sequence Alignment with Q-Learning Based on the Actor-Critic Model. ACM T ASIAN LOW-RESO 2021. [DOI: 10.1145/3433540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Poirier D, Théolier J, Marega R, Delahaut P, Gillard N, Godefroy SB. Evaluation of the discriminatory potential of antibodies created from synthetic peptides derived from wheat, barley, rye and oat gluten. PLoS One 2021;16:e0257466. [PMID: 34555094 PMCID: PMC8459967 DOI: 10.1371/journal.pone.0257466] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Accepted: 09/01/2021] [Indexed: 11/18/2022] Open

Abstract

Celiac disease (CD) is triggered by ingestion of gluten-containing cereals such as wheat, barley, rye and in some cases oat. The only way for affected individuals to avoid symptoms of this condition is to adopt a gluten-free diet. Thus, gluten-free foodstuffs need to be monitored in order to ensure their innocuity. For this purpose, commercial immunoassays based on recognition of defined linear gluten sequences are currently used. These immunoassays are designed to detect or quantify total gluten regardless of the cereal, and often result in over or underestimation of the exact gluten content. In addition, Canadian regulations require a declaration of the source of gluten on the label of prepackaged foods, which cannot be done due to the limitations of existing methods. In this study, the development of new antibodies targeting discrimination of gluten sources was conducted using synthetic peptides as immunization strategy. Fourteen synthetic peptides selected from unique linear amino acid sequences of gluten were bioconjugated to Concholepas concholepas hemocyanin (CCH) as protein carrier, to elicit antibodies in rabbit. The resulting polyclonal antibodies (pAbs) successfully discriminated wheat, barley and oat prolamins during indirect ELISA assessments. pAbs raised against rye synthetic peptides cross-reacted evenly with wheat and rye prolamins but could still be useful to successfully discriminate gluten sources in combination with the other pAbs. Discrimination of gluten sources can be further refined and enhanced by raising monoclonal antibodies using a similar immunization strategy. A methodology capable of discriminating gluten sources, such as the one proposed in this study, could facilitate compliance with Canadian regulations on this matter. This type of discrimination could also complement current immunoassays by settling the issue of over and underestimation of gluten content, thus improving the safety of food intended to CD and wheat-allergic patients.

Collapse

Neuwald AF, Lanczycki CJ, Hodges TK, Marchler-Bauer A. Obtaining extremely large and accurate protein multiple sequence alignments from curated hierarchical alignments. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021;2020:5850901. [PMID: 32500917 PMCID: PMC7297217 DOI: 10.1093/database/baaa042] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 04/01/2020] [Accepted: 05/06/2020] [Indexed: 11/12/2022]

Abstract

For optimal performance, machine learning methods for protein sequence/structural analysis typically require as input a large multiple sequence alignment (MSA), which is often created using query-based iterative programs, such as PSI-BLAST or JackHMMER. However, because these programs align database sequences using a query sequence as a template, they may fail to detect or may tend to misalign sequences distantly related to the query. More generally, automated MSA programs often fail to align sequences correctly due to the unpredictable nature of protein evolution. Addressing this problem typically requires manual curation in the light of structural data. However, curated MSAs tend to contain too few sequences to serve as input for statistically based methods. We address these shortcomings by making publicly available a set of 252 curated hierarchical MSAs (hiMSAs), containing a total of 26 212 066 sequences, along with programs for generating from these extremely large MSAs. Each hiMSA consists of a set of hierarchically arranged MSAs representing individual subgroups within a superfamily along with template MSAs specifying how to align each subgroup MSA against MSAs higher up the hierarchy. Central to this approach is the MAPGAPS search program, which uses a hiMSA as a query to align (potentially vast numbers of) matching database sequences with accuracy comparable to that of the curated hiMSA. We illustrate this process for the exonuclease–endonuclease–phosphatase superfamily and for pleckstrin homology domains. A set of extremely large MSAs generated from the hiMSAs in this way is available as input for deep learning, big data analyses. MAPGAPS, auxiliary programs CDD2MGS, AddPhylum, PurgeMSA and ConvertMSA and links to National Center for Biotechnology Information data files are available at https://www.igs.umaryland.edu/labs/neuwald/software/mapgaps/.

Collapse

Storer JM, Hubley R, Rosen J, Smit AFA. Curation Guidelines for de novo Generated Transposable Element Families. Curr Protoc 2021;1:e154. [PMID: 34138525 DOI: 10.1002/cpz1.154] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Bhardwaj V, Pevzner PA, Rashtchian C, Safonova Y. Trace Reconstruction Problems in Computational Biology. IEEE TRANSACTIONS ON INFORMATION THEORY 2021;67:3295-3314. [PMID: 34176957 PMCID: PMC8224466 DOI: 10.1109/tit.2020.3030569] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Neuwald AF, Kolaczkowski BD, Altschul SF. eCOMPASS: evaluative comparison of multiple protein alignments by statistical score. Bioinformatics 2021;37:3456-3463. [PMID: 33983436 PMCID: PMC8545322 DOI: 10.1093/bioinformatics/btab374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Revised: 03/31/2021] [Accepted: 05/12/2021] [Indexed: 11/21/2022] Open

Akand EH, Murray JM. NGlyAlign: an automated library building tool to align highly divergent HIV envelope sequences. BMC Bioinformatics 2021;22:54. [PMID: 33557755 PMCID: PMC7869453 DOI: 10.1186/s12859-020-03901-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Accepted: 11/23/2020] [Indexed: 08/29/2023] Open

Abstract

BACKGROUND

The high variability in envelope regions of some viruses such as HIV allow the virus to establish infection and to escape subsequent immune surveillance. This variability, as well as increasing incorporation of N-linked glycosylation sites, is fundamental to this evasion. It also creates difficulties for multiple sequence alignment methods (MSA) that provide the first step in their analysis. Existing MSA tools often fail to properly align highly variable HIV envelope sequences requiring extensive manual editing that is impractical with even a moderate number of these variable sequences.

RESULTS

We developed an automated library building tool NGlyAlign, that organizes similar N-linked glycosylation sites as block constraints and statistically conserved global sites as single site constraints to automatically enforce partial columns in consistency-based MSA methods such as Dialign. This combined method accurately aligns variable HIV-1 envelope sequences. We tested the method on two datasets: a set of 156 founder and chronic gp160 HIV-1 subtype B sequences as well as a set of reference sequences of gp120 in the highly variable region 1. On measures such as entropy scores, sum of pair scores, column score, and similarity heat maps, NGlyAlign+Dialign proved superior against methods such as T-Coffee, ClustalOmega, ClustalW, Praline, HIValign and Muscle. The method is scalable to large sequence sets producing accurate alignments without requiring manual editing. As well as this application to HIV, our method can be used for other highly variable glycoproteins such as hepatitis C virus envelope.

CONCLUSIONS

NGlyAlign is an automated tool for mapping and building glycosylation motif libraries to accurately align highly variable regions in HIV sequences. It can provide the basis for many studies reliant on single robust alignments. NGlyAlign has been developed as an open-source tool and is freely available at https://github.com/UNSW-Mathematical-Biology/NGlyAlign_v1.0 .

Collapse

New Approaches for Inferring Phylogenies in the Presence of Paralogs. Trends Genet 2021;37:174-187. [DOI: 10.1016/j.tig.2020.08.012] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 08/13/2020] [Accepted: 08/19/2020] [Indexed: 12/18/2022]

Optical pattern generator for efficient bio-data encoding in a photonic sequence comparison architecture. PLoS One 2021;16:e0245095. [PMID: 33449928 PMCID: PMC7810328 DOI: 10.1371/journal.pone.0245095] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 12/21/2020] [Indexed: 11/19/2022] Open

Perrin A, Rocha EPC. PanACoTA: a modular tool for massive microbial comparative genomics. NAR Genom Bioinform 2021;3:lqaa106. [PMID: 33575648 PMCID: PMC7803007 DOI: 10.1093/nargab/lqaa106] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 11/10/2020] [Accepted: 12/01/2020] [Indexed: 02/06/2023] Open

Risser F, Collin S, Dos Santos-Morais R, Gruez A, Chagot B, Weissman KJ. Towards improved understanding of intersubunit interactions in modular polyketide biosynthesis: Docking in the enacyloxin IIa polyketide synthase. J Struct Biol 2020;212:107581. [DOI: 10.1016/j.jsb.2020.107581] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Revised: 07/14/2020] [Accepted: 07/16/2020] [Indexed: 12/26/2022]

Portik DM, Wiens JJ. Do Alignment and Trimming Methods Matter for Phylogenomic (UCE) Analyses? Syst Biol 2020;70:440-462. [PMID: 32797207 DOI: 10.1093/sysbio/syaa064] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 08/02/2020] [Accepted: 08/03/2020] [Indexed: 11/14/2022] Open

Abstract

Alignment is a crucial issue in molecular phylogenetics because different alignment methods can potentially yield very different topologies for individual genes. But it is unclear if the choice of alignment methods remains important in phylogenomic analyses, which incorporate data from hundreds or thousands of genes. For example, problematic biases in alignment might be multiplied across many loci, whereas alignment errors in individual genes might become irrelevant. The issue of alignment trimming (i.e., removing poorly aligned regions or missing data from individual genes) is also poorly explored. Here, we test the impact of 12 different combinations of alignment and trimming methods on phylogenomic analyses. We compare these methods using published phylogenomic data from ultraconserved elements (UCEs) from squamate reptiles (lizards and snakes), birds, and tetrapods. We compare the properties of alignments generated by different alignment and trimming methods (e.g., length, informative sites, missing data). We also test whether these data sets can recover well-established clades when analyzed with concatenated (RAxML) and species-tree methods (ASTRAL-III), using the full data ($\sim $5000 loci) and subsampled data sets (10% and 1% of loci). We show that different alignment and trimming methods can significantly impact various aspects of phylogenomic data sets (e.g., length, informative sites). However, these different methods generally had little impact on the recovery and support values for well-established clades, even across very different numbers of loci. Nevertheless, our results suggest several "best practices" for alignment and trimming. Intriguingly, the choice of phylogenetic methods impacted the phylogenetic results most strongly, with concatenated analyses recovering significantly more well-established clades (with stronger support) than the species-tree analyses. [Alignment; concatenated analysis; phylogenomics; sequence length heterogeneity; species-tree analysis; trimming].

Collapse

Carpentier M, Chomilier J. Protein multiple alignments: sequence-based versus structure-based programs. Bioinformatics 2020;35:3970-3980. [PMID: 30942864 DOI: 10.1093/bioinformatics/btz236] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Revised: 03/05/2019] [Accepted: 04/02/2019] [Indexed: 11/14/2022] Open

Jermiin LS, Catullo RA, Holland BR. A new phylogenetic protocol: dealing with model misspecification and confirmation bias in molecular phylogenetics. NAR Genom Bioinform 2020;2:lqaa041. [PMID: 33575594 PMCID: PMC7671319 DOI: 10.1093/nargab/lqaa041] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 05/18/2020] [Accepted: 06/04/2020] [Indexed: 12/15/2022] Open

Features spaces and a learning system for structural-temporal data, and their application on a use case of real-time communication network validation data. PLoS One 2020;15:e0228434. [PMID: 32027668 PMCID: PMC7004316 DOI: 10.1371/journal.pone.0228434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2018] [Accepted: 01/15/2020] [Indexed: 11/21/2022] Open

Nunez-Castilla J, Siltberg-Liberles J. An Easy Protocol for Evolutionary Analysis of Intrinsically Disordered Proteins. Methods Mol Biol 2020;2141:147-177. [PMID: 32696356 DOI: 10.1007/978-1-0716-0524-0_7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Emms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 2019;20:238. [PMID: 31727128 PMCID: PMC6857279 DOI: 10.1186/s13059-019-1832-y] [Citation(s) in RCA: 2922] [Impact Index Per Article: 584.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Accepted: 09/23/2019] [Indexed: 12/22/2022] Open

Zhang D, Gao F, Jakovlić I, Zou H, Zhang J, Li WX, Wang GT. PhyloSuite: An integrated and scalable desktop platform for streamlined molecular sequence data management and evolutionary phylogenetics studies. Mol Ecol Resour 2019;20:348-355. [DOI: 10.1111/1755-0998.13096] [Citation(s) in RCA: 825] [Impact Index Per Article: 165.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 09/12/2019] [Accepted: 09/24/2019] [Indexed: 01/12/2023]

Cuevas-Caballé C, Riutort M, Álvarez-Presas M. Diet assessment of two land planarian species using high-throughput sequencing data. Sci Rep 2019;9:8679. [PMID: 31213615 PMCID: PMC6581950 DOI: 10.1038/s41598-019-44952-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Accepted: 05/29/2019] [Indexed: 11/30/2022] Open

Assessing the Evolutionary Conservation of Protein Disulphide Bonds. Methods Mol Biol 2019. [PMID: 31069762 DOI: 10.1007/978-1-4939-9187-7_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2023]

Alazem O, Abramyan J. Reptile enamel matrix proteins: Selection, divergence, and functional constraint. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2019;332:136-148. [PMID: 31045323 DOI: 10.1002/jez.b.22857] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Revised: 02/24/2019] [Accepted: 04/01/2019] [Indexed: 12/14/2022]

Nute M, Saleh E, Warnow T. Evaluating Statistical Multiple Sequence Alignment in Comparison to Other Alignment Methods on Protein Data Sets. Syst Biol 2019;68:396-411. [PMID: 30329135 PMCID: PMC6472439 DOI: 10.1093/sysbio/syy068] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Revised: 09/27/2018] [Accepted: 10/11/2018] [Indexed: 01/15/2023] Open

Mangul S, Martin LS, Hill BL, Lam AKM, Distler MG, Zelikovsky A, Eskin E, Flint J. Systematic benchmarking of omics computational tools. Nat Commun 2019;10:1393. [PMID: 30918265 PMCID: PMC6437167 DOI: 10.1038/s41467-019-09406-4] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Accepted: 03/06/2019] [Indexed: 01/11/2023] Open

Liu L, Wang H. The Recent Applications and Developments of Bioinformatics and Omics Technologies in Traditional Chinese Medicine. Curr Bioinform 2019. [DOI: 10.2174/1574893614666190102125403] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

High-Throughput Reconstruction of Ancestral Protein Sequence, Structure, and Molecular Function. Methods Mol Biol 2019;1851:135-170. [PMID: 30298396 DOI: 10.1007/978-1-4939-8736-8_8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Abstract

Ancestral protein sequence reconstruction is a powerful technique for explicitly testing hypotheses about the evolution of molecular function, allowing researchers to meticulously dissect how historical changes in protein sequence impacted functional repertoire by altering the protein's 3D structure. These techniques have provided concrete, experimentally validated insights into ancient evolutionary processes and help illuminate the complex relationship between protein sequence, structure, and function. Inferring the protein family phylogenies on which ancestral sequence reconstruction depends and reconstructing the sequences, themselves, are amenable to high-throughput computational analysis. However, determining the structures of ancestral-reconstructed proteins and characterizing their functions typically rely on time-consuming and expensive laboratory analyses, limiting most current studies to examining a relatively small number of specific hypotheses. For this reason, we have little detailed, unbiased information about how molecular function evolves across large protein family phylogenies. Here we describe a generalized protocol that integrates ancestral sequence reconstruction with structural homology modeling and structure-based molecular affinity prediction to characterize historical changes in protein function across families with thousands of individual sequences. We highlight key steps in the analysis protocol requiring particularly careful attention to avoid introducing potential errors as well as steps for which computationally efficient subroutines can be substituted for more intensive approaches, allowing researchers to scale the analysis up or down, depending on available resources and requirements for reproducibility and scientific rigor. In our view, this approach provides a compelling compliment to more laboratory-intensive procedures, generating important contextual information that can help guide detailed experiments.

Collapse

Ashkenazy H, Sela I, Levy Karin E, Landan G, Pupko T. Multiple Sequence Alignment Averaging Improves Phylogeny Reconstruction. Syst Biol 2019;68:117-130. [PMID: 29771363 PMCID: PMC6657586 DOI: 10.1093/sysbio/syy036] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Revised: 05/07/2018] [Accepted: 05/09/2018] [Indexed: 01/11/2023] Open

Wang Y, Wu H, Cai Y. A benchmark study of sequence alignment methods for protein clustering. BMC Bioinformatics 2018;19:529. [PMID: 30598070 PMCID: PMC6311937 DOI: 10.1186/s12859-018-2524-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Zhou PY, Sze-To A, Wong AKC. Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics. BMC Med Genomics 2018;11:103. [PMID: 30453949 PMCID: PMC6245498 DOI: 10.1186/s12920-018-0417-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

Abstract

Background

A protein family has similar and diverse functions locally conserved. An aligned pattern cluster (APC) can reflect the conserved functionality. Discovering aligned residue associations (ARAs) in APCs can reveal subtle inner working characteristics of conserved regions of protein families. However, ARAs corresponding to different functionalities/subgroups/classes could be entangled because of subtle multiple entwined factors.

Methods

To discover and disentangle patterns from mixed-mode datasets, such as APCs when the residues are replaced by their fundamental biochemical properties list, this paper presents a novel method, Extended Aligned Residual Association Discovery and Disentanglement (E-ARADD). E-ARADD discretizes the numerical dataset to transform the mixed-mode dataset into an event-value dataset, constructs an ARA Frequency Matrix and then converts it into an adjusted Statistical Residual (SR) Vector Space (SRV) capturing statistical deviation from randomness. By applying Principal Component (PC) Decomposition on SRV, PCs ranked by their variance are obtained. Finally, the disentangled ARAs are discovered when the projections on a PC is re-projected to a vector space with the same basis vectors of SRV.

Results

Experiments on synthetic, cytochrome c and class A scavenger data have shown that E-ARADD can a) disentangle the entwined ARAs in APCs (with residues or biochemical properties), b) reveal subtle AR clusters relating to classes, subtle subgroups or specific functionalities.

Conclusions

E-ARADD can discover and disentangle ARs and ARAs entangled in functionality and location of protein families to reveal functional subgroups and subgroup characteristics of biological conserved regions. Experimental results on synthetic data provides the proof-of-concept validation on the successful disentanglement that reveals class-associated ARAs with or without class labels as input. Experiments on cytochrome c data proved the efficacy of E-ARADD in handing both types of residue data. Our novel methodology is not only able to discover and disentangle ARs and ARAs in specific statistical/functional (PCs and RSRVs) spaces, but also their locations in the protein family functional domains. The success of E-ARADD shows its great potential to proteomic research, drug discovery and precision and personalized genetic medicine.

Collapse

Lowe EK, Garm AL, Ullrich-Lüter E, Cuomo C, Arnone MI. The crowns have eyes: multiple opsins found in the eyes of the crown-of-thorns starfish Acanthaster planci. BMC Evol Biol 2018;18:168. [PMID: 30419810 PMCID: PMC6233551 DOI: 10.1186/s12862-018-1276-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2017] [Accepted: 10/18/2018] [Indexed: 01/01/2023] Open