1
|
Zhang L, Tao L, Ye L, He L, Zhu YZ, Zhu YD, Zhou Y. Alternative splicing and expression profile analysis of expressed sequence tags in domestic pig. GENOMICS PROTEOMICS & BIOINFORMATICS 2007; 5:25-34. [PMID: 17572361 PMCID: PMC5054103 DOI: 10.1016/s1672-0229(07)60011-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Domestic pig (Sus scrofa domestica) is one of the most important mammals to humans. Alternative splicing is a cellular mechanism in eukaryotes that greatly increases the diversity of gene products. Expression sequence tags (ESTs) have been widely used for gene discovery, expression profile analysis, and alternative splicing detection. In this study, a total of 712,905 ESTs extracted from 101 different non-normalized EST libraries of the domestic pig were analyzed. These EST libraries cover the nervous system, digestive system, immune system, and meat production related tissues from embryo, newborn, and adult pigs, making contributions to the analysis of alternative splicing variants as well as expression profiles in various stages of tissues. A modified approach was designed to cluster and assemble large EST datasets, aiming to detect alternative splicing together with EST abundance of each splicing variant. Much efforts were made to classify alternative splicing into different types and apply different filters to each type to get more reliable results. Finally, a total of 1,223 genes with average 2.8 splicing variants were detected among 16,540 unique genes. The overview of expression profiles would change when we take alternative splicing into account.
Collapse
Affiliation(s)
- Liang Zhang
- James D. Watson Institute of Genome Sciences, Zhejiang University, Hangzhou 310008, China
- Hangzhou Genomics Institute, Hangzhou 310008, China
- Current address: Zhejiang Hisun Pharmaceutical Co. Ltd. (Shanghai Office), Shanghai 200233, China
| | - Lin Tao
- James D. Watson Institute of Genome Sciences, Zhejiang University, Hangzhou 310008, China
- Hangzhou Genomics Institute, Hangzhou 310008, China
| | - Lin Ye
- James D. Watson Institute of Genome Sciences, Zhejiang University, Hangzhou 310008, China
- Hangzhou Genomics Institute, Hangzhou 310008, China
| | - Ling He
- James D. Watson Institute of Genome Sciences, Zhejiang University, Hangzhou 310008, China
- Hangzhou Genomics Institute, Hangzhou 310008, China
| | - Yuan-Zhong Zhu
- James D. Watson Institute of Genome Sciences, Zhejiang University, Hangzhou 310008, China
- Hangzhou Genomics Institute, Hangzhou 310008, China
| | - Yue-Dong Zhu
- James D. Watson Institute of Genome Sciences, Zhejiang University, Hangzhou 310008, China
- Hangzhou Genomics Institute, Hangzhou 310008, China
| | - Yan Zhou
- James D. Watson Institute of Genome Sciences, Zhejiang University, Hangzhou 310008, China
- Hangzhou Genomics Institute, Hangzhou 310008, China
- Current address: School of Life Sciences, Fudan University, Shanghai 200433, China
- Corresponding author.
| |
Collapse
|
2
|
Ferrer-Costa C, Orozco M, de la Cruz X. Sequence-based prediction of pathological mutations. Proteins 2006; 57:811-9. [PMID: 15390262 DOI: 10.1002/prot.20252] [Citation(s) in RCA: 146] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The development of methods to assess the impact of amino acid mutations on human health has become an important goal in biomedical research, due to the growing number of nonsynonymous SNPs identified. Within this context, computational methods constitute a valuable tool, because they can easily process large amounts of mutations and give useful, almost cost-free, information on their pathological character. In this paper we present a computational approach to the prediction of disease-associated amino acid mutations, using only sequence-based information (amino acid properties, evolutionary information, secondary structure and accessibility predictions, and database annotations) and neural networks, as a model building tool. Mutations are predicted to be either pathological or neutral. Our results show that the method has a good overall success rate, 83%, that can reach 95% when trained for specific proteins. The methodology is fast and flexible enough to provide good estimates of the pathological character of large sets of nonsynonymous SNPs, but can also be easily adapted to give more precise predictions for proteins of special biomedical interest.
Collapse
Affiliation(s)
- C Ferrer-Costa
- Molecular Modeling and Bioinformatics Unit, Institut de Recerca Biomédica, Parc Científic de Barcelona, Barcelona, Spain
| | | | | |
Collapse
|
3
|
Zhou Y, Zhou C, Ye L, Dong J, Xu H, Cai L, Zhang L, Wei L. Database and analyses of known alternatively spliced genes in plants. Genomics 2004; 82:584-95. [PMID: 14611800 DOI: 10.1016/s0888-7543(03)00204-0] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Alternative splicing is an important cellular mechanism that increases the diversity of gene products. The number of alternatively spliced genes reported so far in plants is much smaller than that in mammals, but is increasing as a result of the explosive growth of available EST and genomic sequences. We have searched for all alternatively spliced genes reported in GenBank and PubMed in all plant species under Viridiplantae. After careful merging and manual review of the search results, we obtained a comprehensive, high-quality collection of 168 genes reported to be alternatively spliced in plants, spanning 44 plant species (March 22, 2003 update). We developed a relational database with Web-based user interface to store and present the data, named the Plant Alternative Splicing Database (PASDB), freely available at http://pasdb.genomics.org.cn. We analyzed the functional categories that these genes belong to using the Gene Ontology. We also analyzed in detail the biological roles and gene structures of the four genes that are known to be alternatively spliced in more than one plant species. Finally, we studied the structural features of the splice sites in the alternatively spliced genes.
Collapse
Affiliation(s)
- Yan Zhou
- Hangzhou Genomics Institute, Key Laboratory of Bioinformatics of Zhejiang Province, Zhejiang University, Hangzhou, Zhejiang 310007, China
| | | | | | | | | | | | | | | |
Collapse
|
4
|
Abstract
DNA repair is essential for the maintenance of genomic integrity. Consequently, altered repair capacity may impact individual health in such areas as aging and susceptibility to certain diseases. Defects in some DNA repair genes, for example, have been shown to increase cancer risk, accelerate aging and impair neurological functions. Now that over 115 genes directly involved in human DNA repair have been characterized at the DNA sequence level, the identification of single nucleotide polymorphisms (SNPs) in DNA repair genes is becoming a reality. This information will likely lead to the identification of alleles, or combinations of alleles that affect disease predisposition. This communication summarizes SNPs identified to date in the coding region of 24 human double-strand break repair (DSBR) genes. SNP data for four of these genes were obtained by screening at least 100 individuals in our laboratory. For each SNP, the codon number, amino acid substitution, allele frequency and population information is supplied.
Collapse
Affiliation(s)
- Cindy C Ruttan
- Centre for Biomedical Research, University of Victoria, P.O. Box 3020 STN CSC,Victoria, BC, Canada V8W 3N5
| | | |
Collapse
|
5
|
Abstract
Changes in gene expression and regulation--due in particular to the evolution of cis-regulatory DNA sequences--may underlie many evolutionary changes in phenotypes, yet little is known about the distribution of such variation in populations. We present in this study the first survey of experimentally validated functional cis-regulatory polymorphism. These data are derived from more than 140 polymorphisms involved in the regulation of 107 genes in Homo sapiens, the eukaryote species with the most available data. We find that functional cis-regulatory variation is widespread in the human genome and that the consequent variation in gene expression is twofold or greater for 63% of the genes surveyed. Transcription factor-DNA interactions are highly polymorphic, and regulatory interactions have been gained and lost within human populations. On average, humans are heterozygous at more functional cis-regulatory sites (>16,000) than at amino acid positions (<13,000), in part because of an overrepresentation among the former in multiallelic tandem repeat variation, especially (AC)(n) dinucleotide microsatellites. The role of microsatellites in gene expression variation may provide a larger store of heritable phenotypic variation, and a more rapid mutational input of such variation, than has been realized. Finally, we outline the distinctive consequences of cis-regulatory variation for the genotype-phenotype relationship, including ubiquitous epistasis and genotype-by-environment interactions, as well as underappreciated modes of pleiotropy and overdominance. Ordinary small-scale mutations contribute to pervasive variation in transcription rates and consequently to patterns of human phenotypic variation.
Collapse
|
6
|
Abstract
The AH receptor (AHR) mediates toxicity of 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) as well as induction of three cytochrome P450 enzymes and certain Phase II enzymes. In laboratory animals, genetic variations in the AHR lead to substantial differences in sensitivity to biochemical and toxic effects of TCDD and related compounds. Relatively few polymorphisms have been discovered in the human AHR gene; these occur predominantly in exon 10, a region that encodes a major portion of the transactivation domain of the receptor that is responsible for regulating expression of other genes. In human populations there is a wide range of variation in responses regulated by the AHR for example, induction of CYP1A1. Some variation in human responsiveness likely is due to genetically based variations in AHR structure. Thus far, however, only one pair of polymorphisms, those at codons 517 and 570, has been shown to have a clear cut and strong effect on the phenotype of an AHR-mediated response. The search continues for polymorphisms that alter AHR function because this receptor is a central factor in determining responses to important environmental contaminants and also plays a physiologic role in early development in mammals.
Collapse
Affiliation(s)
- Patricia A Harper
- Division of Clinical Pharmacology, Research Institute, The Hospital for Sick Children, 555 University Avenue, Toronto, ON, Canada M5G 1X8.
| | | | | | | |
Collapse
|
7
|
Sunyaev SR, Lathe WC, Ramensky VE, Bork P. SNP frequencies in human genes an excess of rare alleles and differing modes of selection. Trends Genet 2000; 16:335-7. [PMID: 10904261 DOI: 10.1016/s0168-9525(00)02058-8] [Citation(s) in RCA: 68] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Affiliation(s)
- S R Sunyaev
- European Molecular Biology Laboratory (EMBL), Heidelberg, Germany.
| | | | | | | |
Collapse
|
8
|
Schultz J, Doerks T, Ponting CP, Copley RR, Bork P. More than 1,000 putative new human signalling proteins revealed by EST data mining. Nat Genet 2000; 25:201-4. [PMID: 10835637 DOI: 10.1038/76069] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Cloning procedures aided by homology searches of EST databases have accelerated the pace of discovery of new genes, but EST database searching remains an involved and onerous task. More than 1.6 million human EST sequences have been deposited in public databases, making it difficult to identify ESTs that represent new genes. Compounding the problems of scale are difficulties in detection associated with a high sequencing error rate and low sequence similarity between distant homologues. We have developed a new method, coupling BLAST-based searches with a domain identification protocol, that filters candidate homologues. Application of this method in a large-scale analysis of 100 signalling domain families has led to the identification of ESTs representing more than 1,000 novel human signalling genes. The 4,206 publicly available ESTs representing these genes are a valuable resource for rapid cloning of novel human signalling proteins. For example, we were able to identify ESTs of at least 106 new small GTPases, of which 6 are likely to belong to new subfamilies. In some cases, further analyses of genomic DNA led to the discovery of previously unidentified full-length protein sequences. This is exemplified by the in silico cloning (prediction of a gene product sequence using only genomic and EST sequence data) of a new type of GTPase with two catalytic domains.
Collapse
Affiliation(s)
- J Schultz
- [1] EMBL, Heidelberg, Germany. [2] Max-Delbrück-Center, Berlin-Buch, Germany
| | | | | | | | | |
Collapse
|
9
|
Brett D, Hanke J, Lehmann G, Haase S, Delbrück S, Krueger S, Reich J, Bork P. EST comparison indicates 38% of human mRNAs contain possible alternative splice forms. FEBS Lett 2000; 474:83-6. [PMID: 10828456 DOI: 10.1016/s0014-5793(00)01581-7] [Citation(s) in RCA: 223] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Expressed sequence tag (EST) databases represent a large volume of information on expressed genes including tissue type, expression profile and exon structure. In this study we create an extensive data set of human alternative splicing. We report the analysis of 7867 non-redundant mRNAs, 3011 of which contained alternative splice forms (38% of all mRNAs analysed). From a total of 12572 ESTs 4560 different possible alternative splice forms were detected. Interestingly, 70% of the alternative splice forms correspond to exon deletion events with only 30% exonic insertions. We experimentally verified 19 different splice forms from 16 genes in a total subset of 20 studied; all of the respective genes are of medical relevance.
Collapse
Affiliation(s)
- D Brett
- Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, Berlin-Buch 13125, Germany.
| | | | | | | | | | | | | | | |
Collapse
|
10
|
Sunyaev S, Ramensky V, Bork P. Towards a structural basis of human non-synonymous single nucleotide polymorphisms. Trends Genet 2000; 16:198-200. [PMID: 10782110 DOI: 10.1016/s0168-9525(00)01988-0] [Citation(s) in RCA: 266] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- S Sunyaev
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69012 Heidelberg, Germany.
| | | | | |
Collapse
|