Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Triant DA, Pearson WR. Most partial domains in proteins are alignment and annotation artifacts. Genome Biol 2015;16:99. [PMID: 25976240 PMCID: PMC4443539 DOI: 10.1186/s13059-015-0656-7] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Accepted: 04/15/2015] [Indexed: 12/19/2022] Open

For:	Triant DA, Pearson WR. Most partial domains in proteins are alignment and annotation artifacts. Genome Biol 2015;16:99. [PMID: 25976240 PMCID: PMC4443539 DOI: 10.1186/s13059-015-0656-7] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Accepted: 04/15/2015] [Indexed: 12/19/2022] Open

Number

Cited by Other Article(s)

Bromberg Y, Prabakaran R, Kabir A, Shehu A. Variant Effect Prediction in the Age of Machine Learning. Cold Spring Harb Perspect Biol 2024;16:a041467. [PMID: 38621825 PMCID: PMC11216171 DOI: 10.1101/cshperspect.a041467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2024]

Insana G, Martin MJ, Pearson WR. Improved selection of canonical proteins for reference proteomes. NAR Genom Bioinform 2024;6:lqae066. [PMID: 38863529 PMCID: PMC11165316 DOI: 10.1093/nargab/lqae066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 05/04/2024] [Accepted: 05/23/2024] [Indexed: 06/13/2024] Open

Vitting-Seerup K. Most protein domains exist as variants with distinct functions across cells, tissues and diseases. NAR Genom Bioinform 2023;5:lqad084. [PMID: 37745975 PMCID: PMC10516350 DOI: 10.1093/nargab/lqad084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/09/2023] [Accepted: 09/05/2023] [Indexed: 09/26/2023] Open

Bacala R, Hatcher DW, Perreault H, Fu BX. Challenges and opportunities for proteomics and the improvement of bread wheat quality. JOURNAL OF PLANT PHYSIOLOGY 2022;275:153743. [PMID: 35749977 DOI: 10.1016/j.jplph.2022.153743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 05/13/2022] [Accepted: 05/30/2022] [Indexed: 06/15/2023]

Kuo TCY, Hatakeyama M, Tameshige T, Shimizu KK, Sese J. Homeolog expression quantification methods for allopolyploids. Brief Bioinform 2021;21:395-407. [PMID: 30590436 PMCID: PMC7299288 DOI: 10.1093/bib/bby121] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Revised: 11/06/2018] [Accepted: 11/21/2018] [Indexed: 12/19/2022] Open

Abstract

Genome duplication with hybridization, or allopolyploidization, occurs in animals, fungi and plants, and is especially common in crop plants. There is an increasing interest in the study of allopolyploids because of advances in polyploid genome assembly; however, the high level of sequence similarity in duplicated gene copies (homeologs) poses many challenges. Here we compared standard RNA-seq expression quantification approaches used currently for diploid species against subgenome-classification approaches which maps reads to each subgenome separately. We examined mapping error using our previous and new RNA-seq data in which a subgenome is experimentally added (synthetic allotetraploid Arabidopsis kamchatica) or reduced (allohexaploid wheat Triticum aestivum versus extracted allotetraploid) as ground truth. The error rates in the two species were very similar. The standard approaches showed higher error rates (>10% using pseudo-alignment with Kallisto) while subgenome-classification approaches showed much lower error rates (<1% using EAGLE-RC, <2% using HomeoRoq). Although downstream analysis may partly mitigate mapping errors, the difference in methods was substantial in hexaploid wheat, where Kallisto appeared to have systematic differences relative to other methods. Only approximately half of the differentially expressed homeologs detected using Kallisto overlapped with those by any other method in wheat. In general, disagreement in low-expression genes was responsible for most of the discordance between methods, which is consistent with known biases in Kallisto. We also observed that there exist uncertainties in genome sequences and annotation which can affect each method differently. Overall, subgenome-classification approaches tend to perform better than standard approaches with EAGLE-RC having the highest precision.

Collapse

Tran HKR, Grebenc DW, Klein TA, Whitney JC. Bacterial type VII secretion: An important player in host-microbe and microbe-microbe interactions. Mol Microbiol 2021;115:478-489. [PMID: 33410158 DOI: 10.1111/mmi.14680] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Revised: 01/03/2021] [Accepted: 01/04/2021] [Indexed: 12/19/2022]

Kemena C, Dohmen E, Bornberg-Bauer E. DOGMA: a web server for proteome and transcriptome quality assessment. Nucleic Acids Res 2020;47:W507-W510. [PMID: 31076763 PMCID: PMC6602495 DOI: 10.1093/nar/gkz366] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 04/18/2019] [Accepted: 04/29/2019] [Indexed: 11/16/2022] Open

Genomic analysis of the tryptome reveals molecular mechanisms of gland cell evolution. EvoDevo 2019;10:23. [PMID: 31583070 PMCID: PMC6767649 DOI: 10.1186/s13227-019-0138-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2019] [Accepted: 09/13/2019] [Indexed: 12/25/2022] Open

Deutekom ES, Vosseberg J, van Dam TJP, Snel B. Measuring the impact of gene prediction on gene loss estimates in Eukaryotes by quantifying falsely inferred absences. PLoS Comput Biol 2019;15:e1007301. [PMID: 31461468 PMCID: PMC6736253 DOI: 10.1371/journal.pcbi.1007301] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Revised: 09/10/2019] [Accepted: 08/01/2019] [Indexed: 12/25/2022] Open

Kirsip H, Abroi A. Protein Structure-Guided Hidden Markov Models (HMMs) as A Powerful Method in the Detection of Ancestral Endogenous Viral Elements. Viruses 2019;11:v11040320. [PMID: 30986983 PMCID: PMC6520822 DOI: 10.3390/v11040320] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Revised: 03/23/2019] [Accepted: 03/27/2019] [Indexed: 12/19/2022] Open

Mahajan S, Ramya TNC. Nature-inspired engineering of an F-type lectin for increased binding strength. Glycobiology 2019;28:933-948. [PMID: 30202877 DOI: 10.1093/glycob/cwy082] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Accepted: 09/07/2018] [Indexed: 11/13/2022] Open

Vaattovaara A, Leppälä J, Salojärvi J, Wrzaczek M. High-throughput sequencing data and the impact of plant gene annotation quality. JOURNAL OF EXPERIMENTAL BOTANY 2019;70:1069-1076. [PMID: 30590678 PMCID: PMC6382340 DOI: 10.1093/jxb/ery434] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Accepted: 11/28/2018] [Indexed: 06/02/2023]

Cocker JM, Wright J, Li J, Swarbreck D, Dyer S, Caccamo M, Gilmartin PM. Primula vulgaris (primrose) genome assembly, annotation and gene expression, with comparative genomics on the heterostyly supergene. Sci Rep 2018;8:17942. [PMID: 30560928 PMCID: PMC6299000 DOI: 10.1038/s41598-018-36304-4] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 11/14/2018] [Indexed: 11/24/2022] Open

Silveira MC, Azevedo da Silva R, Faria da Mota F, Catanho M, Jardim R, R Guimarães AC, de Miranda AB. Systematic Identification and Classification of β-Lactamases Based on Sequence Similarity Criteria: β-Lactamase Annotation. Evol Bioinform Online 2018;14:1176934318797351. [PMID: 30210232 PMCID: PMC6131288 DOI: 10.1177/1176934318797351] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 08/08/2018] [Indexed: 12/11/2022] Open

Kinjo AR. Cooperative "folding transition" in the sequence space facilitates function-driven evolution of protein families. J Theor Biol 2018;443:18-27. [PMID: 29355538 DOI: 10.1016/j.jtbi.2018.01.019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2017] [Revised: 01/16/2018] [Accepted: 01/17/2018] [Indexed: 12/23/2022]

Menichelli C, Gascuel O, Bréhélin L. Improving pairwise comparison of protein sequences with domain co-occurrence. PLoS Comput Biol 2018;14:e1005889. [PMID: 29293498 PMCID: PMC5766236 DOI: 10.1371/journal.pcbi.1005889] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Revised: 01/12/2018] [Accepted: 11/23/2017] [Indexed: 01/17/2023] Open

Abstract

Comparing and aligning protein sequences is an essential task in bioinformatics. More specifically, local alignment tools like BLAST are widely used for identifying conserved protein sub-sequences, which likely correspond to protein domains or functional motifs. However, to limit the number of false positives, these tools are used with stringent sequence-similarity thresholds and hence can miss several hits, especially for species that are phylogenetically distant from reference organisms. A solution to this problem is then to integrate additional contextual information to the procedure. Here, we propose to use domain co-occurrence to increase the sensitivity of pairwise sequence comparisons. Domain co-occurrence is a strong feature of proteins, since most protein domains tend to appear with a limited number of other domains on the same protein. We propose a method to take this information into account in a typical BLAST analysis and to construct new domain families on the basis of these results. We used Plasmodium falciparum as a case study to evaluate our method. The experimental findings showed an increase of 14% of the number of significant BLAST hits and an increase of 25% of the proteome area that can be covered with a domain. Our method identified 2240 new domains for which, in most cases, no model of the Pfam database could be linked. Moreover, our study of the quality of the new domains in terms of alignment and physicochemical properties show that they are close to that of standard Pfam domains. Source code of the proposed approach and supplementary data are available at: https://gite.lirmm.fr/menichelli/pairwise-comparison-with-cooccurrence

Deciphering the functions of the different proteins of an organism constitutes a first step toward the understanding of its biology. Because they provide strong clues regarding protein functions, domains occupy a key position among the relevant annotations that can be assigned to a protein. Protein domains are sequential motifs that are conserved along evolution and are found in different proteins and in different combinations. One common approach for identifying the domains of a protein is to run sequence-sequence comparisons with local alignment tools as BLAST. However these approaches sometimes miss several hits, especially for species that are phylogenetically distant from reference organisms. We propose here an approach to increase the sensitivity of pairwise sequence comparisons. This approach makes use of the fact that protein domains tend to appear with a limited number of other domains on the same protein (the domain co-occurrence property). On P. falciparum, our approach allows identifying 2240 new domains for which, in most cases, no domain of the Pfam database could be linked.

Collapse

Complex evolutionary footprints revealed in an analysis of reused protein segments of diverse lengths. Proc Natl Acad Sci U S A 2017;114:11703-11708. [PMID: 29078314 PMCID: PMC5676897 DOI: 10.1073/pnas.1707642114] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open

Abstract

We question a central paradigm: namely, that the protein domain is the “atomic unit” of evolution. In conflict with the current textbook view, our results unequivocally show that duplication of protein segments happens both above and below the domain level among amino acid segments of diverse lengths. Indeed, we show that significant evolutionary information is lost when the protein is approached as a string of domains. Our finer-grained approach reveals a far more complicated picture, where reused segments often intertwine and overlap with each other. Our results are consistent with a recursive model of evolution, in which segments of various lengths, typically smaller than domains, “hop” between environments. The fit segments remain, leaving traces that can still be detected.

Proteins share similar segments with one another. Such “reused parts”—which have been successfully incorporated into other proteins—are likely to offer an evolutionary advantage over de novo evolved segments, as most of the latter will not even have the capacity to fold. To systematically explore the evolutionary traces of segment “reuse” across proteins, we developed an automated methodology that identifies reused segments from protein alignments. We search for “themes”—segments of at least 35 residues of similar sequence and structure—reused within representative sets of 15,016 domains [Evolutionary Classification of Protein Domains (ECOD) database] or 20,398 chains [Protein Data Bank (PDB)]. We observe that theme reuse is highly prevalent and that reuse is more extensive when the length threshold for identifying a theme is lower. Structural domains, the best characterized form of reuse in proteins, are just one of many complex and intertwined evolutionary traces. Others include long themes shared among a few proteins, which encompass and overlap with shorter themes that recur in numerous proteins. The observed complexity is consistent with evolution by duplication and divergence, and some of the themes might include descendants of ancestral segments. The observed recursive footprints, where the same amino acid can simultaneously participate in several intertwined themes, could be a useful concept for protein design. Data are available at http://trachel-srv.cs.haifa.ac.il/rachel/ppi/themes/.

Collapse

Koehorst JJ, Saccenti E, Schaap PJ, Martins Dos Santos VAP, Suarez-Diez M. Protein domain architectures provide a fast, efficient and scalable alternative to sequence-based methods for comparative functional genomics. F1000Res 2016;5:1987. [PMID: 27703668 PMCID: PMC5031134 DOI: 10.12688/f1000research.9416.3] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/26/2017] [Indexed: 11/20/2022] Open

Lees JG, Dawson NL, Sillitoe I, Orengo CA. Functional innovation from changes in protein domains and their combinations. Curr Opin Struct Biol 2016;38:44-52. [DOI: 10.1016/j.sbi.2016.05.016] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Revised: 05/17/2016] [Accepted: 05/24/2016] [Indexed: 10/21/2022]

Punta M, Mistry J. Homology-Based Annotation of Large Protein Datasets. Methods Mol Biol 2016;1415:153-176. [PMID: 27115632 DOI: 10.1007/978-1-4939-3572-7_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Kelley LA, Sternberg MJE. Partial protein domains: evolutionary insights and bioinformatics challenges. Genome Biol 2015;16:100. [PMID: 25986583 PMCID: PMC4436111 DOI: 10.1186/s13059-015-0663-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Prakash A, Bateman A. Domain atrophy creates rare cases of functional partial protein domains. Genome Biol 2015;16:88. [PMID: 25924720 PMCID: PMC4432964 DOI: 10.1186/s13059-015-0655-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Accepted: 04/15/2015] [Indexed: 01/12/2023] Open