Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Karlin S. Statistical signals in bioinformatics. Proc Natl Acad Sci U S A 2005;102:13355-62. [PMID: 16157888 PMCID: PMC1224616 DOI: 10.1073/pnas.0501804102] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Number

Cited by Other Article(s)

Duality Between the Local Score of One Sequence and Constrained Hidden Markov Model. Methodol Comput Appl Probab 2022. [DOI: 10.1007/s11009-021-09856-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Margelevičius M. Estimating statistical significance of local protein profile-profile alignments. BMC Bioinformatics 2019;20:419. [PMID: 31409275 PMCID: PMC6693267 DOI: 10.1186/s12859-019-2913-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2019] [Accepted: 05/23/2019] [Indexed: 11/10/2022] Open

Screening of nucleotide variations in genomic sequences encoding charged protein regions in the human genome. BMC Genomics 2017;18:588. [PMID: 28789634 PMCID: PMC5549384 DOI: 10.1186/s12864-017-4000-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2017] [Accepted: 08/01/2017] [Indexed: 11/24/2022] Open

Abstract

Background

Studying genetic variation distribution in proteins containing charged regions, called charge clusters (CCs), is of great interest to unravel their functional role. Charge clusters are 20 to 75 residue segments with high net positive charge, high net negative charge, or high total charge relative to the overall charge composition of the protein. We previously developed a bioinformatics tool (FCCP) to detect charge clusters in proteomes and scanned the human proteome for the occurrence of CCs. In this paper we investigate the genetic variations in the human proteins harbouring CCs.

Results

We studied the coding regions of 317 positively charged clusters and 1020 negatively charged ones previously detected in human proteins. Results revealed that coding parts of CCs are richer in sequence variants than their corresponding genes, full mRNAs, and exonic + intronic sequences and that these variants are predominately rare (Minor allele frequency < 0.005). Furthermore, variants occurring in the coding parts of positively charged regions of proteins are more often pathogenic than those occurring in negatively charged ones. Classification of variants according to their types showed that substitution is the major type followed by Indels (Insertions-deletions). Concerning substitutions, it was found that within clusters of both charges, the charged amino acids were the greatest loser groups whereas polar residues were the greatest gainers.

Conclusions

Our findings highlight the prominent features of the human charged regions from the DNA up to the protein sequence which might provide potential clues to improve the current understanding of those charged regions and their implication in the emergence of diseases.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-017-4000-3) contains supplementary material, which is available to authorized users.

Collapse

Lagnoux A, Mercier S, Vallois P. Statistical significance based on length and position of the local score in a model of i.i.d. sequences. Bioinformatics 2017;33:654-660. [PMID: 28035025 DOI: 10.1093/bioinformatics/btw699] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Accepted: 11/08/2016] [Indexed: 11/14/2022] Open

Rare k-mer DNA: Identification of sequence motifs and prediction of CpG island and promoter. J Theor Biol 2015;387:88-100. [PMID: 26427337 DOI: 10.1016/j.jtbi.2015.09.014] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2015] [Revised: 09/10/2015] [Accepted: 09/15/2015] [Indexed: 12/20/2022]

Belmabrouk S, Kharrat N, Benmarzoug R, Rebai A. Exploring proteome-wide occurrence of clusters of charged residues in eukaryotes. Proteins 2015;83:1252-61. [PMID: 25963617 DOI: 10.1002/prot.24823] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2015] [Revised: 04/17/2015] [Accepted: 04/29/2015] [Indexed: 11/09/2022]

Kryukov K, Sumiyama K, Ikeo K, Gojobori T, Saitou N. A new database (GCD) on genome composition for eukaryote and prokaryote genome sequences and their initial analyses. Genome Biol Evol 2012;4:501-12. [PMID: 22417913 PMCID: PMC3342873 DOI: 10.1093/gbe/evs026] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Fu WJ, Stromberg AJ, Viele K, Carroll RJ, Wu G. Statistics and bioinformatics in nutritional sciences: analysis of complex data in the era of systems biology. J Nutr Biochem 2010;21:561-72. [PMID: 20233650 DOI: 10.1016/j.jnutbio.2009.11.007] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2008] [Revised: 11/10/2009] [Accepted: 11/12/2009] [Indexed: 10/19/2022]

Benachenhou F, Jern P, Oja M, Sperber G, Blikstad V, Somervuo P, Kaski S, Blomberg J. Evolutionary conservation of orthoretroviral long terminal repeats (LTRs) and ab initio detection of single LTRs in genomic data. PLoS One 2009;4:e5179. [PMID: 19365549 PMCID: PMC2664473 DOI: 10.1371/journal.pone.0005179] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2008] [Accepted: 03/10/2009] [Indexed: 01/06/2023] Open

Abstract

Background

Retroviral LTRs, paired or single, influence the transcription of both retroviral and non-retroviral genomic sequences. Vertebrate genomes contain many thousand endogenous retroviruses (ERVs) and their LTRs. Single LTRs are difficult to detect from genomic sequences without recourse to repetitiveness or presence in a proviral structure. Understanding of LTR structure increases understanding of LTR function, and of functional genomics. Here we develop models of orthoretroviral LTRs useful for detection in genomes and for structural analysis.

Principal Findings

Although mutated, ERV LTRs are more numerous and diverse than exogenous retroviral (XRV) LTRs. Hidden Markov models (HMMs), and alignments based on them, were created for HML- (human MMTV-like), general-beta-, gamma- and lentiretroviruslike LTRs, plus a general-vertebrate LTR model. Training sets were XRV LTRs and RepBase LTR consensuses. The HML HMM was most sensitive and detected 87% of the HML LTRs in human chromosome 19 at 96% specificity. By combining all HMMs with a low cutoff, for screening, 71% of all LTRs found by RepeatMasker in chromosome 19 were found. HMM consensus sequences had a conserved modular LTR structure. Target site duplications (TG-CA), TATA (occasionally absent), an AATAAA box and a T-rich region were prominent features. Most of the conservation was located in, or adjacent to, R and U5, with evidence for stem loops. Several of the long HML LTRs contained long ORFs inserted after the second A rich module. HMM consensus alignment allowed comparison of functional features like transcriptional start sites (sense and antisense) between XRVs and ERVs.

Conclusion

The modular conserved and redundant orthoretroviral LTR structure with three A-rich regions is reminiscent of structurally relaxed Giardia promoters. The five HMMs provided a novel broad range, repeat-independent, ab initio LTR detection, with prospects for greater generalisation, and insight into LTR structure, which may aid development of LTR-targeted pharmaceuticals.

Collapse

Compressing proteomes: the relevance of medium range correlations. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2008:60723. [PMID: 18256727 DOI: 10.1155/2007/60723] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2007] [Revised: 05/28/2007] [Accepted: 09/10/2007] [Indexed: 11/17/2022]

Aschard H, Guedj M, Demenais F. A two-step multiple-marker strategy for genome-wide association studies. BMC Proc 2007;1 Suppl 1:S134. [PMID: 18466477 PMCID: PMC2367542 DOI: 10.1186/1753-6561-1-s1-s134] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Reconsidering the significance of genomic word frequencies. Trends Genet 2007;23:543-6. [PMID: 17964682 DOI: 10.1016/j.tig.2007.07.008] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2007] [Revised: 06/26/2007] [Accepted: 07/09/2007] [Indexed: 11/22/2022]

Chew DSH, Leung MY, Choi KP. AT excursion: a new approach to predict replication origins in viral genomes by locating AT-rich regions. BMC Bioinformatics 2007;8:163. [PMID: 17517140 PMCID: PMC1904460 DOI: 10.1186/1471-2105-8-163] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2006] [Accepted: 05/21/2007] [Indexed: 11/12/2022] Open

Mrázek J, Karlin S. Distinctive features of large complex virus genomes and proteomes. Proc Natl Acad Sci U S A 2007;104:5127-32. [PMID: 17360339 PMCID: PMC1829274 DOI: 10.1073/pnas.0700429104] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open