Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Indrischek H, Wieseke N, Stadler PF, Prohaska SJ. The paralog-to-contig assignment problem: high quality gene models from fragmented assemblies. Algorithms Mol Biol 2016;11:1. [PMID: 26913054 PMCID: PMC4765045 DOI: 10.1186/s13015-016-0063-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Accepted: 02/02/2016] [Indexed: 11/10/2022] Open

For:	Indrischek H, Wieseke N, Stadler PF, Prohaska SJ. The paralog-to-contig assignment problem: high quality gene models from fragmented assemblies. Algorithms Mol Biol 2016;11:1. [PMID: 26913054 PMCID: PMC4765045 DOI: 10.1186/s13015-016-0063-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Accepted: 02/02/2016] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

Wang Y. Algorithms for the Uniqueness of the Longest Common Subsequence. J Bioinform Comput Biol 2023;21:2350027. [PMID: 38212873 DOI: 10.1142/s0219720023500270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2024]

Khojasteh H, Khanteymoori A, Olyaee MH. Comparing protein-protein interaction networks of SARS-CoV-2 and (H1N1) influenza using topological features. Sci Rep 2022;12:5867. [PMID: 35393450 PMCID: PMC8988119 DOI: 10.1038/s41598-022-08574-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Accepted: 03/03/2022] [Indexed: 01/04/2023] Open

Reinhardt F, Stadler PF. ExceS-A: an exon-centric split aligner. J Integr Bioinform 2022;19:jib-2021-0040. [PMID: 35254744 PMCID: PMC9069663 DOI: 10.1515/jib-2021-0040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 01/12/2022] [Indexed: 11/25/2022] Open

Nagy NA, Rácz R, Rimington O, Póliska S, Orozco-terWengel P, Bruford MW, Barta Z. Draft genome of a biparental beetle species, Lethrus apterus. BMC Genomics 2021;22:301. [PMID: 33902445 PMCID: PMC8074431 DOI: 10.1186/s12864-021-07627-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Accepted: 04/13/2021] [Indexed: 11/23/2022] Open

Kaur M, Kumar A, Siddaraju NK, Fairoze MN, Chhabra P, Ahlawat S, Vijh RK, Yadav A, Arora R. Differential expression of miRNAs in skeletal muscles of Indian sheep with diverse carcass and muscle traits. Sci Rep 2020;10:16332. [PMID: 33004825 PMCID: PMC7529745 DOI: 10.1038/s41598-020-73071-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Accepted: 09/03/2020] [Indexed: 12/15/2022] Open

Razo-Mendivil FG, Martínez O, Hayano-Kanashiro C. Compacta: a fast contig clustering tool for de novo assembled transcriptomes. BMC Genomics 2020;21:148. [PMID: 32046653 PMCID: PMC7014741 DOI: 10.1186/s12864-020-6528-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Accepted: 01/22/2020] [Indexed: 12/25/2022] Open

Abstract

Background

RNA-Seq is the preferred method to explore transcriptomes and to estimate differential gene expression. When an organism has a well-characterized and annotated genome, reads obtained from RNA-Seq experiments can be directly mapped to that genome to estimate the number of transcripts present and relative expression levels of these transcripts. However, for unknown genomes, de novo assembly of RNA-Seq reads must be performed to generate a set of contigs that represents the transcriptome. These contig sets contain multiple transcripts, including immature mRNAs, spliced transcripts and allele variants, as well as products of close paralogs or gene families that can be difficult to distinguish. Thus, tools are needed to select a set of less redundant contigs to represent the transcriptome for downstream analyses. Here we describe the development of Compacta to produce contig sets from de novo assemblies.

Results

Compacta is a fast and flexible computational tool that allows selection of a representative set of contigs from de novo assemblies. Using a graph-based algorithm, Compacta groups contigs into clusters based on the proportion of shared reads. The user can determine the minimum coverage of the contigs to be clustered, as well as a threshold for the proportion of shared reads in the clustered contigs, thus providing a dynamic range of transcriptome compression that can be adapted according to experimental aims. We compared the performance of Compacta against state of the art clustering algorithms on assemblies from Arabidopsis, mouse and mango, and found that Compacta yielded more rapid results and had competitive precision and recall ratios. We describe and demonstrate a pipeline to tailor Compacta parameters to specific experimental aims.

Conclusions

Compacta is a fast and flexible algorithm for the determination of optimum contig sets that represent the transcriptome for downstream analyses.

Collapse

Jose JM, Yilmaz E, Magalhães J, Castells P, Ferro N, Silva MJ, Martins F. Moving from Formal Towards Coherent Concept Analysis: Why, When and How. LECTURE NOTES IN COMPUTER SCIENCE 2020. [PMCID: PMC7148255 DOI: 10.1007/978-3-030-45439-5_19] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Lokits AD, Indrischek H, Meiler J, Hamm HE, Stadler PF. Tracing the evolution of the heterotrimeric G protein α subunit in Metazoa. BMC Evol Biol 2018;18:51. [PMID: 29642851 PMCID: PMC5896119 DOI: 10.1186/s12862-018-1147-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Accepted: 03/06/2018] [Indexed: 01/07/2023] Open

Abstract

BACKGROUND

Heterotrimeric G proteins are fundamental signaling proteins composed of three subunits, Gα and a Gβγ dimer. The role of Gα as a molecular switch is critical for transmitting and amplifying intracellular signaling cascades initiated by an activated G protein Coupled Receptor (GPCR). Despite their biochemical and therapeutic importance, the study of G protein evolution has been limited to the scope of a few model organisms. Furthermore, of the five primary Gα subfamilies, the underlying gene structure of only two families has been thoroughly investigated outside of Mammalia evolution. Therefore our understanding of Gα emergence and evolution across phylogeny remains incomplete.

RESULTS

We have computationally identified the presence and absence of every Gα gene (GNA-) across all major branches of Deuterostomia and evaluated the conservation of the underlying exon-intron structures across these phylogenetic groups. We provide evidence of mutually exclusive exon inclusion through alternative splicing in specific lineages. Variations of splice site conservation and isoforms were found for several paralogs which coincide with conserved, putative motifs of DNA-/RNA-binding proteins. In addition to our curated gene annotations, within Primates, we identified 15 retrotranspositions, many of which have undergone pseudogenization. Most importantly, we find numerous deviations from previous findings regarding the presence and absence of individual GNA- genes, nuanced differences in phyla-specific gene copy numbers, novel paralog duplications and subsequent intron gain and loss events.

CONCLUSIONS

Our curated annotations allow us to draw more accurate inferences regarding the emergence of all Gα family members across Metazoa and to present a new, updated theory of Gα evolution. Leveraging this, our results are critical for gaining new insights into the co-evolution of the Gα subunit and its many protein binding partners, especially therapeutically relevant G protein - GPCR signaling pathways which radiated in Vertebrata evolution.

Collapse

Acuña-Amador L, Primot A, Cadieu E, Roulet A, Barloy-Hubler F. Genomic repeats, misassembly and reannotation: a case study with long-read resequencing of Porphyromonas gingivalis reference strains. BMC Genomics 2018;19:54. [PMID: 29338683 PMCID: PMC5771137 DOI: 10.1186/s12864-017-4429-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 12/29/2017] [Indexed: 12/15/2022] Open

Abstract

BACKGROUND

Without knowledge of their genomic sequences, it is impossible to make functional models of the bacteria that make up human and animal microbiota. Unfortunately, the vast majority of publicly available genomes are only working drafts, an incompleteness that causes numerous problems and constitutes a major obstacle to genotypic and phenotypic interpretation. In this work, we began with an example from the class Bacteroidia in the phylum Bacteroidetes, which is preponderant among human orodigestive microbiota. We successfully identify the genetic loci responsible for assembly breaks and misassemblies and demonstrate the importance and usefulness of long-read sequencing and curated reannotation.

RESULTS

We showed that the fragmentation in Bacteroidia draft genomes assembled from massively parallel sequencing linearly correlates with genomic repeats of the same or greater size than the reads. We also demonstrated that some of these repeats, especially the long ones, correspond to misassembled loci in three reference Porphyromonas gingivalis genomes marked as circularized (thus complete or finished). We prove that even at modest coverage (30X), long-read resequencing together with PCR contiguity verification (rrn operons and an integrative and conjugative element or ICE) can be used to identify and correct the wrongly combined or assembled regions. Finally, although time-consuming and labor-intensive, consistent manual biocuration of three P. gingivalis strains allowed us to compare and correct the existing genomic annotations, resulting in a more accurate interpretation of the genomic differences among these strains.

CONCLUSIONS

In this study, we demonstrate the usefulness and importance of long-read sequencing in verifying published genomes (even when complete) and generating assemblies for new bacterial strains/species with high genomic plasticity. We also show that when combined with biological validation processes and diligent biocurated annotation, this strategy helps reduce the propagation of errors in shared databases, thus limiting false conclusions based on incomplete or misleading information.

Collapse

Sheikhizadeh S, Schranz ME, Akdel M, de Ridder D, Smit S. PanTools: representation, storage and exploration of pan-genomic data. Bioinformatics 2017;32:i487-i493. [PMID: 27587666 DOI: 10.1093/bioinformatics/btw455] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Indrischek H, Prohaska SJ, Gurevich VV, Gurevich EV, Stadler PF. Uncovering missing pieces: duplication and deletion history of arrestins in deuterostomes. BMC Evol Biol 2017;17:163. [PMID: 28683816 PMCID: PMC5501109 DOI: 10.1186/s12862-017-1001-4] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Accepted: 06/19/2017] [Indexed: 12/13/2022] Open

Abstract

BACKGROUND

The cytosolic arrestin proteins mediate desensitization of activated G protein-coupled receptors (GPCRs) via competition with G proteins for the active phosphorylated receptors. Arrestins in active, including receptor-bound, conformation are also transducers of signaling. Therefore, this protein family is an attractive therapeutic target. The signaling outcome is believed to be a result of structural and sequence-dependent interactions of arrestins with GPCRs and other protein partners. Here we elucidated the detailed evolution of arrestins in deuterostomes.

RESULTS

Identity and number of arrestin paralogs were determined searching deuterostome genomes and gene expression data. In contrast to standard gene prediction methods, our strategy first detects exons situated on different scaffolds and then solves the problem of assigning them to the correct gene. This increases both the completeness and the accuracy of the annotation in comparison to conventional database search strategies applied by the community. The employed strategy enabled us to map in detail the duplication- and deletion history of arrestin paralogs including tandem duplications, pseudogenizations and the formation of retrogenes. The two rounds of whole genome duplications in the vertebrate stem lineage gave rise to four arrestin paralogs. Surprisingly, visual arrestin ARR3 was lost in the mammalian clades Afrotheria and Xenarthra. Duplications in specific clades, on the other hand, must have given rise to new paralogs that show signatures of diversification in functional elements important for receptor binding and phosphate sensing.

CONCLUSION

The current study traces the functional evolution of deuterostome arrestins in unprecedented detail. Based on a precise re-annotation of the exon-intron structure at nucleotide resolution, we infer the gain and loss of paralogs and patterns of conservation, co-variation and selection.

Collapse

Bao E, Lan L. HALC: High throughput algorithm for long read error correction. BMC Bioinformatics 2017;18:204. [PMID: 28381259 PMCID: PMC5382505 DOI: 10.1186/s12859-017-1610-3] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2016] [Accepted: 03/24/2017] [Indexed: 11/24/2022] Open

Abstract

BACKGROUND

The third generation PacBio SMRT long reads can effectively address the read length issue of the second generation sequencing technology, but contain approximately 15% sequencing errors. Several error correction algorithms have been designed to efficiently reduce the error rate to 1%, but they discard large amounts of uncorrected bases and thus lead to low throughput. This loss of bases could limit the completeness of downstream assemblies and the accuracy of analysis.

RESULTS

Here, we introduce HALC, a high throughput algorithm for long read error correction. HALC aligns the long reads to short read contigs from the same species with a relatively low identity requirement so that a long read region can be aligned to at least one contig region, including its true genome region's repeats in the contigs sufficiently similar to it (similar repeat based alignment approach). It then constructs a contig graph and, for each long read, references the other long reads' alignments to find the most accurate alignment and correct it with the aligned contig regions (long read support based validation approach). Even though some long read regions without the true genome regions in the contigs are corrected with their repeats, this approach makes it possible to further refine these long read regions with the initial insufficient short reads and correct the uncorrected regions in between. In our performance tests on E. coli, A. thaliana and Maylandia zebra data sets, HALC was able to obtain 6.7-41.1% higher throughput than the existing algorithms while maintaining comparable accuracy. The HALC corrected long reads can thus result in 11.4-60.7% longer assembled contigs than the existing algorithms.

CONCLUSIONS

The HALC software can be downloaded for free from this site: https://github.com/lanl001/halc .

Collapse

Henriques R, Ferreira FL, Madeira SC. BicPAMS: software for biological data analysis with pattern-based biclustering. BMC Bioinformatics 2017;18:82. [PMID: 28153040 PMCID: PMC5290636 DOI: 10.1186/s12859-017-1493-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Accepted: 01/21/2017] [Indexed: 12/21/2022] Open

Abstract

BACKGROUND

Biclustering has been largely applied for the unsupervised analysis of biological data, being recognised today as a key technique to discover putative modules in both expression data (subsets of genes correlated in subsets of conditions) and network data (groups of coherently interconnected biological entities). However, given its computational complexity, only recent breakthroughs on pattern-based biclustering enabled efficient searches without the restrictions that state-of-the-art biclustering algorithms place on the structure and homogeneity of biclusters. As a result, pattern-based biclustering provides the unprecedented opportunity to discover non-trivial yet meaningful biological modules with putative functions, whose coherency and tolerance to noise can be tuned and made problem-specific.

METHODS

To enable the effective use of pattern-based biclustering by the scientific community, we developed BicPAMS (Biclustering based on PAttern Mining Software), a software that: 1) makes available state-of-the-art pattern-based biclustering algorithms (BicPAM (Henriques and Madeira, Alg Mol Biol 9:27, 2014), BicNET (Henriques and Madeira, Alg Mol Biol 11:23, 2016), BicSPAM (Henriques and Madeira, BMC Bioinforma 15:130, 2014), BiC2PAM (Henriques and Madeira, Alg Mol Biol 11:1-30, 2016), BiP (Henriques and Madeira, IEEE/ACM Trans Comput Biol Bioinforma, 2015), DeBi (Serin and Vingron, AMB 6:1-12, 2011) and BiModule (Okada et al., IPSJ Trans Bioinf 48(SIG5):39-48, 2007)); 2) consistently integrates their dispersed contributions; 3) further explores additional accuracy and efficiency gains; and 4) makes available graphical and application programming interfaces.

RESULTS

Results on both synthetic and real data confirm the relevance of BicPAMS for biological data analysis, highlighting its essential role for the discovery of putative modules with non-trivial yet biologically significant functions from expression and network data.

CONCLUSIONS

BicPAMS is the first biclustering tool offering the possibility to: 1) parametrically customize the structure, coherency and quality of biclusters; 2) analyze large-scale biological networks; and 3) tackle the restrictive assumptions placed by state-of-the-art biclustering algorithms. These contributions are shown to be key for an adequate, complete and user-assisted unsupervised analysis of biological data.

SOFTWARE

BicPAMS and its tutorial available in http://www.bicpams.com .

Collapse

Henriques R, Madeira SC. BiC2PAM: constraint-guided biclustering for biological data analysis with domain knowledge. Algorithms Mol Biol 2016;11:23. [PMID: 27651825 PMCID: PMC5024481 DOI: 10.1186/s13015-016-0085-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Accepted: 08/16/2016] [Indexed: 11/10/2022] Open

Abstract

Background

Biclustering has been largely used in biological data analysis, enabling the discovery of putative functional modules from omic and network data. Despite the recognized importance of incorporating domain knowledge to guide biclustering and guarantee a focus on relevant and non-trivial biclusters, this possibility has not yet been comprehensively addressed. This results from the fact that the majority of existing algorithms are only able to deliver sub-optimal solutions with restrictive assumptions on the structure, coherency and quality of biclustering solutions, thus preventing the up-front satisfaction of knowledge-driven constraints. Interestingly, in recent years, a clearer understanding of the synergies between pattern mining and biclustering gave rise to a new class of algorithms, termed as pattern-based biclustering algorithms. These algorithms, able to efficiently discover flexible biclustering solutions with optimality guarantees, are thus positioned as good candidates for knowledge incorporation. In this context, this work aims to bridge the current lack of solid views on the use of background knowledge to guide (pattern-based) biclustering tasks.

Methods

This work extends (pattern-based) biclustering algorithms to guarantee the satisfiability of constraints derived from background knowledge and to effectively explore efficiency gains from their incorporation. In this context, we first show the relevance of constraints with succinct, (anti-)monotone and convertible properties for the analysis of expression data and biological networks. We further show how pattern-based biclustering algorithms can be adapted to effectively prune of the search space in the presence of such constraints, as well as be guided in the presence of biological annotations. Relying on these contributions, we propose BiClustering with Constraints using PAttern Mining (BiC2PAM), an extension of BicPAM and BicNET biclustering algorithms.

Results

Experimental results on biological data demonstrate the importance of incorporating knowledge within biclustering to foster efficiency and enable the discovery of non-trivial biclusters with heightened biological relevance.

Conclusions

This work provides the first comprehensive view and sound algorithm for biclustering biological data with constraints derived from user expectations, knowledge repositories and/or literature.

Collapse

Girotto S, Pizzi C, Comin M. MetaProb: accurate metagenomic reads binning based on probabilistic sequence signatures. Bioinformatics 2016;32:i567-i575. [DOI: 10.1093/bioinformatics/btw466] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open