1
|
Rzeszutek I, Singh A. Small RNAs, Big Diseases. Int J Mol Sci 2020; 21:E5699. [PMID: 32784829 PMCID: PMC7460979 DOI: 10.3390/ijms21165699] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 08/06/2020] [Accepted: 08/08/2020] [Indexed: 02/06/2023] Open
Abstract
The past two decades have seen extensive research done to pinpoint the role of microRNAs (miRNAs) that have led to discovering thousands of miRNAs in humans. It is not, therefore, surprising to see many of them implicated in a number of common as well as rare human diseases. In this review article, we summarize the progress in our understanding of miRNA-related research in conjunction with different types of cancers and neurodegenerative diseases, as well as their potential in generating more reliable diagnostic and therapeutic approaches.
Collapse
Affiliation(s)
- Iwona Rzeszutek
- Institute of Biology and Biotechnology, Department of Biotechnology, University of Rzeszow, Pigonia 1, 35-310 Rzeszow, Poland
| | - Aditi Singh
- Max Planck Institute for Developmental Biology, Max-Planck-Ring 5, 72076 Tübingen, Germany
| |
Collapse
|
2
|
Campbell MJ. Tales from topographic oceans: topologically associated domains and cancer. Endocr Relat Cancer 2019; 26:R611-R626. [PMID: 31505466 PMCID: PMC7664306 DOI: 10.1530/erc-19-0348] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 09/09/2019] [Indexed: 01/03/2023]
Abstract
The 3D organization of the genome within the cell nucleus has come into sharp focus over the last decade. This has largely arisen because of the application of genomic approaches that have revealed numerous levels of genomic and chromatin interactions, including topologically associated domains (TADs). The current review examines how these domains were identified, are organized, how their boundaries arise and are regulated, and how genes within TADs are coordinately regulated. There are many examples of the disruption to TAD structure in cancer and the altered regulation, structure and function of TADs are discussed in the context of hormone responsive cancers, including breast, prostate and ovarian cancer. Finally, some aspects of the statistical insight and computational skills required to interrogate TAD organization are considered and future directions discussed.
Collapse
Affiliation(s)
- Moray J Campbell
- Division of Pharmaceutics and Pharmaceutical Chemistry, College of Pharmacy, The Ohio State University, Columbus, Ohio, USA
| |
Collapse
|
3
|
Kingan SB, Urban J, Lambert CC, Baybayan P, Childers AK, Coates B, Scheffler B, Hackett K, Korlach J, Geib SM. A high-quality genome assembly from a single, field-collected spotted lanternfly (Lycorma delicatula) using the PacBio Sequel II system. Gigascience 2019; 8:giz122. [PMID: 31609423 PMCID: PMC6791401 DOI: 10.1093/gigascience/giz122] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Revised: 08/08/2019] [Accepted: 09/17/2019] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND A high-quality reference genome is an essential tool for applied and basic research on arthropods. Long-read sequencing technologies may be used to generate more complete and contiguous genome assemblies than alternate technologies; however, long-read methods have historically had greater input DNA requirements and higher costs than next-generation sequencing, which are barriers to their use on many samples. Here, we present a 2.3 Gb de novo genome assembly of a field-collected adult female spotted lanternfly (Lycorma delicatula) using a single Pacific Biosciences SMRT Cell. The spotted lanternfly is an invasive species recently discovered in the northeastern United States that threatens to damage economically important crop plants in the region. RESULTS The DNA from 1 individual was used to make 1 standard, size-selected library with an average DNA fragment size of ∼20 kb. The library was run on 1 Sequel II SMRT Cell 8M, generating a total of 132 Gb of long-read sequences, of which 82 Gb were from unique library molecules, representing ∼36× coverage of the genome. The assembly had high contiguity (contig N50 length = 1.5 Mb), completeness, and sequence level accuracy as estimated by conserved gene set analysis (96.8% of conserved genes both complete and without frame shift errors). Furthermore, it was possible to segregate more than half of the diploid genome into the 2 separate haplotypes. The assembly also recovered 2 microbial symbiont genomes known to be associated with L. delicatula, each microbial genome being assembled into a single contig. CONCLUSIONS We demonstrate that field-collected arthropods can be used for the rapid generation of high-quality genome assemblies, an attractive approach for projects on emerging invasive species, disease vectors, or conservation efforts of endangered species.
Collapse
Affiliation(s)
- Sarah B Kingan
- Pacific Biosciences, 1305 O'Brien Drive, Menlo Park, CA 94025, USA
| | - Julie Urban
- Department of Entomology, 501 ASI Building, The Pennsylvania State University, University Park, PA 16802, USA
| | | | - Primo Baybayan
- Pacific Biosciences, 1305 O'Brien Drive, Menlo Park, CA 94025, USA
| | - Anna K Childers
- USDA-ARS, Bee Research Laboratory, 10300 Baltimore Avenue, Building 306, Room 315, BARC-East, Beltsville, MD 20705, USA
| | - Brad Coates
- USDA-ARS, Corn Insects and Crop Genetics Research Unit, 2333 Genetics Laboratory, 819 Wallace Road, Ames, IA 50011, USA
| | - Brian Scheffler
- USDA-ARS, Genomics and Bioinformatics Research, 141 Experiment Station Road, Stoneville, MS 38776, USA
| | - Kevin Hackett
- USDA-ARS, Office of National Programs, George Washington Carver Center, 5601 Sunnyside Avenue, Beltsville, MD 20705, USA
| | - Jonas Korlach
- Pacific Biosciences, 1305 O'Brien Drive, Menlo Park, CA 94025, USA
| | - Scott M Geib
- USDA-ARS, Daniel K Inouye U.S. Pacific Basin Agricultural Research Center, 64 Nowelo St., Hilo, HI 96720, USA
| |
Collapse
|
4
|
Abascal F, Juan D, Jungreis I, Kellis M, Martinez L, Rigau M, Rodriguez JM, Vazquez J, Tress ML. Loose ends: almost one in five human genes still have unresolved coding status. Nucleic Acids Res 2019; 46:7070-7084. [PMID: 29982784 PMCID: PMC6101605 DOI: 10.1093/nar/gky587] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 06/18/2018] [Indexed: 12/16/2022] Open
Abstract
Seventeen years after the sequencing of the human genome, the human proteome is still under revision. One in eight of the 22 210 coding genes listed by the Ensembl/GENCODE, RefSeq and UniProtKB reference databases are annotated differently across the three sets. We have carried out an in-depth investigation on the 2764 genes classified as coding by one or more sets of manual curators and not coding by others. Data from large-scale genetic variation analyses suggests that most are not under protein-like purifying selection and so are unlikely to code for functional proteins. A further 1470 genes annotated as coding in all three reference sets have characteristics that are typical of non-coding genes or pseudogenes. These potential non-coding genes also appear to be undergoing neutral evolution and have considerably less supporting transcript and protein evidence than other coding genes. We believe that the three reference databases currently overestimate the number of human coding genes by at least 2000, complicating and adding noise to large-scale biomedical experiments. Determining which potential non-coding genes do not code for proteins is a difficult but vitally important task since the human reference proteome is a fundamental pillar of most basic research and supports almost all large-scale biomedical projects.
Collapse
Affiliation(s)
- Federico Abascal
- Wellcome Trust Sanger Institute, Hinxton CB10 1SA, Cambridgeshire, UK
| | - David Juan
- Comparative Genomics Lab, Instituto de Biologica Evolutiva, Universitat Pompeu Fabra, Barcelona, Spain
| | - Irwin Jungreis
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA and Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Laura Martinez
- Bioinformatics Unit, Spanish National Cancer Research Centre, Madrid, Spain
| | - Maria Rigau
- Computational Biology Life Sciences Group, Barcelona Supercomputing Center, Barcelona, Spain
| | - Jose Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares, Madrid, Spain
| | - Jesus Vazquez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares, Madrid, Spain
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre, Madrid, Spain
| |
Collapse
|
5
|
Affiliation(s)
- J. Rich
- CNRS UMR 8126, Universit Paris-Sud 11, Institut Gustave Roussy
| | - V. V. Ogryzko
- CNRS UMR 8126, Universit Paris-Sud 11, Institut Gustave Roussy
| | | |
Collapse
|
6
|
Dunham I, Beare DM, Collins JE. The characteristics of human genes: analysis of human chromosome 22. Comp Funct Genomics 2010; 4:635-46. [PMID: 18629020 PMCID: PMC2447302 DOI: 10.1002/cfg.335] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2003] [Revised: 09/04/2003] [Accepted: 09/08/2003] [Indexed: 11/11/2022] Open
Affiliation(s)
- Ian Dunham
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
| | | | | |
Collapse
|
7
|
In-tube transfection improves the efficiency of gene transfer in primary neuronal cultures. J Neurosci Methods 2008; 177:348-54. [PMID: 19014969 DOI: 10.1016/j.jneumeth.2008.10.023] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2008] [Revised: 10/14/2008] [Accepted: 10/15/2008] [Indexed: 11/20/2022]
Abstract
To facilitate genetic studies in primary neurons, we analyzed the efficiency of cationic lipid-mediated plasmid DNA transfection using adherent and acutely dissociated neuronal suspensions derived from embryonic mouse cortical tissue. Compared to transfections using adherent cultures, the in-tube procedure enhanced the delivery of a GFP reporter plasmid between four- to eightfold depending on the age of the harvested embryo. The procedure required relatively brief complex incubation times, and supported the transfection of cells expressing the neuronal markers NeuN and TuJ1 with improved uniformity in transfection events across the well surface. To demonstrate the utility of this approach in studying the genetic mechanisms controlling neuron development, we provide data regarding the role of the bZIP transcription factor c/EBP-beta in regulating neurite outgrowth. It is anticipated that this in vitro protocol will facilitate the identification of novel genes involved in both developmental and disease-relevant signaling pathways.
Collapse
|
8
|
Meagher RB, Kandasamy MK, McKinney EC. Multicellular development and protein-protein interactions. PLANT SIGNALING & BEHAVIOR 2008; 3:333-6. [PMID: 19841663 PMCID: PMC2634275 DOI: 10.4161/psb.3.5.5343] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2007] [Accepted: 11/28/2007] [Indexed: 05/20/2023]
Abstract
The macroevolution of organs and tissues in higher plants and animals may have been contingent upon the expansion of numerous gene families encoding interacting proteins. For example, there are dozens of gene families encoding actin cytoskeletal proteins that elaborate intercellular structures influencing development. Once gene family members evolve compartmentalized expression, protein isovariants are free to coevolve new interacting partners that may be incompatible with other related protein networks. Ancient classes of actin isovariants and actin-binding proteins are clear examples of such coevolving networks. Ectopic expression and suppression studies were used to dissect these interactions. In higher plants, the ectopic expression of a reproductive actin isovariant in vegetative cell types causes aberrant reorganization of the F-actin cytoskeleton and bizarre development of most organs and tissues. In contrast, overexpression of vegetative actin in vegetative cell types has little effect. The extreme ectopic actin expression phenotypes are suppressed by the coectopic expression of reproductive profilin or actin depolymerizing factor (ADF/cofilin) isovariants, but not by the overexpression of vegetative profilin or ADF. These data provide evidence for the coevolution of organ-specific protein-protein interactions. Thus, understanding the contingent relationships between the evolution of organ-specific isovariant networks and organ origination may be key to explaining multicellular development.
Collapse
Affiliation(s)
- Richard B Meagher
- Department of Genetics; Davison Life Sciences Building; University of Georgia; Athens, Georgia USA
| | | | | |
Collapse
|
9
|
Levitsky VG, Ignatieva EV, Ananko EA, Turnaev II, Merkulova TI, Kolchanov NA, Hodgman TC. Effective transcription factor binding site prediction using a combination of optimization, a genetic algorithm and discriminant analysis to capture distant interactions. BMC Bioinformatics 2007; 8:481. [PMID: 18093302 PMCID: PMC2265442 DOI: 10.1186/1471-2105-8-481] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2007] [Accepted: 12/19/2007] [Indexed: 12/22/2022] Open
Abstract
Background Reliable transcription factor binding site (TFBS) prediction methods are essential for computer annotation of large amount of genome sequence data. However, current methods to predict TFBSs are hampered by the high false-positive rates that occur when only sequence conservation at the core binding-sites is considered. Results To improve this situation, we have quantified the performance of several Position Weight Matrix (PWM) algorithms, using exhaustive approaches to find their optimal length and position. We applied these approaches to bio-medically important TFBSs involved in the regulation of cell growth and proliferation as well as in inflammatory, immune, and antiviral responses (NF-κB, ISGF3, IRF1, STAT1), obesity and lipid metabolism (PPAR, SREBP, HNF4), regulation of the steroidogenic (SF-1) and cell cycle (E2F) genes expression. We have also gained extra specificity using a method, entitled SiteGA, which takes into account structural interactions within TFBS core and flanking regions, using a genetic algorithm (GA) with a discriminant function of locally positioned dinucleotide (LPD) frequencies. To ensure a higher confidence in our approach, we applied resampling-jackknife and bootstrap tests for the comparison, it appears that, optimized PWM and SiteGA have shown similar recognition performances. Then we applied SiteGA and optimized PWMs (both separately and together) to sequences in the Eukaryotic Promoter Database (EPD). The resulting SiteGA recognition models can now be used to search sequences for BSs using the web tool, SiteGA. Analysis of dependencies between close and distant LPDs revealed by SiteGA models has shown that the most significant correlations are between close LPDs, and are generally located in the core (footprint) region. A greater number of less significant correlations are mainly between distant LPDs, which spanned both core and flanking regions. When SiteGA and optimized PWM models were applied together, this substantially reduced false positives at least at higher stringencies. Conclusion Based on this analysis, SiteGA adds substantial specificity even to optimized PWMs and may be considered for large-scale genome analysis. It adds to the range of techniques available for TFBS prediction, and EPD analysis has led to a list of genes which appear to be regulated by the above TFs.
Collapse
Affiliation(s)
- Victor G Levitsky
- Institute of Cytology and Genetics SB RAS, Novosibirsk, 630090, Russia.
| | | | | | | | | | | | | |
Collapse
|
10
|
Hillgenberg M, Hofmann C, Stadler H, Löser P. High-efficiency system for the construction of adenovirus vectors and its application to the generation of representative adenovirus-based cDNA expression libraries. J Virol 2007; 80:5435-50. [PMID: 16699024 PMCID: PMC1472155 DOI: 10.1128/jvi.00218-06] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
We here describe a convenient system for the production of recombinant adenovirus vectors and its use for the construction of a representative adenovirus-based cDNA expression library. The system is based on direct site-specific insertion of transgene cassettes into a replicating donor virus. The transgene is inserted into a donor plasmid containing the viral 5' inverted terminal repeat, the complete viral packaging signal, and a single loxP site. The plasmid is then transfected into a Cre recombinase-expressing packaging cell line that has been infected with a donor virus containing a partially deleted packaging signal flanked by loxP sites. Cre recombinase, by two steps of action, sequentially catalyzes the generation of a nonpackageable donor virus acceptor substrate and the generation of the desired recombinant adenovirus vector. Due to its growth impairment, residual donor virus can efficiently be counterselected during amplification of the recombinant adenovirus vector. By using this adenovirus construction system, a plasmid-based human liver cDNA library was converted by a single step into an adenovirus-based cDNA expression library with about 10(6) independent adenovirus clones. The high-titer purified library was shown to contain about 44% of full-length cDNAs with an average insert size of 1.3 kb. cDNAs of a gene expressed at a high level (human alpha(1)-antitrypsin) and a gene expressed at a relatively low level (human coagulation factor IX) in human liver were isolated from the adenovirus-based library using an enzyme-linked immunosorbent assay-based screening procedure.
Collapse
|
11
|
Kowalska A, Bozsaky E, Ramsauer T, Rieder D, Bindea G, Lörch T, Trajanoski Z, Ambros PF. A new platform linking chromosomal and sequence information. Chromosome Res 2007; 15:327-39. [PMID: 17406992 DOI: 10.1007/s10577-007-1129-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2006] [Revised: 01/24/2007] [Accepted: 01/24/2007] [Indexed: 10/23/2022]
Abstract
We have tested whether a direct correlation of sequence information and staining properties of chromosomes is possible and whether this combined information can be used to precisely map any position on the chromosome. Despite huge differences of compaction between the naked DNA and the DNA packed in chromosomes we found a striking correlation when visualizing the GGCC density on both levels. Software was developed that allows one to superimpose chromosomal fluorescence intensity profiles generated by chromolysin A3 (CMA3) staining with GGCC density extracted from the Ensembl database. Thus, any position along the chromosome can be defined in megabase pairs (Mb) besides the cytoband information, enabling direct alignment of chromosomal information with the sequence data. The mapping tool was validated using 13 different BAC clones, resulting in a mean difference from Ensembl data of 2 Mb (ranging from 0.79 to 3.57 Mb). Our results indicate that the sequence density information and information gained with sequence-specific fluorochromes are superimposable. Thus, the visualized GGCC motif density along the chromosome (sequence bands) provides a unique platform for comparing different types of genomic information.
Collapse
Affiliation(s)
- Agata Kowalska
- CCRI, Children's Cancer Research Institute, St. Anna Kinderkrebsforschung, 1090, Vienna, Austria
| | | | | | | | | | | | | | | |
Collapse
|
12
|
Ganova-Raeva L, Zhang X, Cao F, Fields H, Khudyakov Y. Primer Extension Enrichment Reaction (PEER): a new subtraction method for identification of genetic differences between biological specimens. Nucleic Acids Res 2006; 34:e76. [PMID: 16790564 PMCID: PMC1484250 DOI: 10.1093/nar/gkl391] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2006] [Revised: 04/20/2006] [Accepted: 05/08/2006] [Indexed: 11/14/2022] Open
Abstract
We developed a conceptually new subtraction strategy for the detection and isolation of target DNA and/or RNA from complex nucleic acid mixtures, called Primer Extension Enrichment Reaction (PEER). PEER uses adapters and class IIS restriction enzymes to generate tagged oligonucleotides from dsDNA fragments derived from specimens containing an unknown target ('tester'). Subtraction is achieved by selectively disabling these oligonucleotides by extension reaction using ddNTPs and a double stranded DNA template generated from a pool of normal specimens ('driver'). Primers that do not acquire ddNTP are used to capture and amplify the unique target DNA from the original tester dsDNA. We successfully applied PEER to specimens containing known infectious agents (Hepatitis B Virus and Walrus Calicivirus) and demonstrated that it has higher efficiency than the best comparable technique. The strategy used for PEER is versatile and can be adapted for the identification of known and unknown pathogens and mutations, differential expression studies and other applications that allow the use of subtractive strategies.
Collapse
Affiliation(s)
- Lilia Ganova-Raeva
- Centers for Disease Control and Prevention, National Center for Infectious Diseases, Division of Viral Hepatitis/Laboratory Branch, Atlanta, GA 30329, USA.
| | | | | | | | | |
Collapse
|
13
|
Kaminsky ZA, Popendikyte V, Assadzadeh A, Petronis A. Search for somatic DNA variation in the brain: investigation of the serotonin 2A receptor gene. Mamm Genome 2005; 16:587-93. [PMID: 16180140 DOI: 10.1007/s00335-005-0040-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2005] [Accepted: 05/05/2005] [Indexed: 01/05/2023]
Abstract
Somatic DNA variation represents one of the most interesting but also one of the least investigated genetic phenomena. In addition to the classical case of DNA hypermutability at the V(D)J region, there is an increasing body of experimental evidence suggesting that genes other than immunoglobulin in tissues other than lymphocytes also exhibit nonuniformity of DNA sequence, which opens new opportunities for explaining various features of multicellular organisms. Identification of somatic DNA mutability, however, is not a trivial task and numerous confounding factors have to be taken into account. In this work we investigated putative DNA variation in the serotonin 2A receptor gene (HTR2A). A series of real-time PCR-based experiments was performed on DNA samples (n = 8) from human brain and peripheral leukocytes. Amplification of the target DNA sequences was carefully matched to that of the control plasmid containing the insert of HTR2A. Sequencing of nearly 500 clones containing a total of 150,000 nucleotides did not show any evidence for somatic DNA variation in the brain and peripheral leukocytes. It is argued in this article that although intraindividual DNA mutability may be a more common phenomenon than is generally accepted, some of the earlier claims of genetic nonidentity on the brain cells may be premature.
Collapse
Affiliation(s)
- Zachary A Kaminsky
- The Krembil Family Epigenetics Laboratory, Centre for Addiction and Mental Health, 250 College Street, Toronto, Ontario, M5T 1R8, Canada
| | | | | | | |
Collapse
|
14
|
Fernandez-Fuentes N, Hermoso A, Espadaler J, Querol E, Aviles FX, Oliva B. Classification of common functional loops of kinase super-families. Proteins 2004; 56:539-55. [PMID: 15229886 DOI: 10.1002/prot.20136] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A structural classification of loops has been obtained from a set of 141 protein structures classified as kinases. A total of 1813 loops was classified into 133 subclasses (9 betabeta(links), 15 betabeta(hairpins), 31 alpha-alpha, 46 alpha-beta and 32 beta-alpha). Functional information and specific features relating subclasses and function were included in the classification. Functional loops such as the P-loop (shared by different folds) or the Gly-rich-loop, among others, were classified into structural motifs. As a result, a common mechanism of catalysis and substrate binding was proved for most kinases. Additionally, the multiple-alignment of loop sequences made within each subclass was shown to be useful for comparative modeling of kinase loops. The classification is summarized in a kinase loop database located at http://sbi.imim.es/archki.
Collapse
Affiliation(s)
- Narcis Fernandez-Fuentes
- Institut de Biotecnologia i Biomedicina and Department de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, Bellaterra 08193, Spain
| | | | | | | | | | | |
Collapse
|
15
|
Schadt EE, Edwards SW, GuhaThakurta D, Holder D, Ying L, Svetnik V, Leonardson A, Hart KW, Russell A, Li G, Cavet G, Castle J, McDonagh P, Kan Z, Chen R, Kasarskis A, Margarint M, Caceres RM, Johnson JM, Armour CD, Garrett-Engele PW, Tsinoremas NF, Shoemaker DD. A comprehensive transcript index of the human genome generated using microarrays and computational approaches. Genome Biol 2004; 5:R73. [PMID: 15461792 PMCID: PMC545593 DOI: 10.1186/gb-2004-5-10-r73] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2004] [Revised: 07/07/2004] [Accepted: 08/16/2004] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Computational and microarray-based experimental approaches were used to generate a comprehensive transcript index for the human genome. Oligonucleotide probes designed from approximately 50,000 known and predicted transcript sequences from the human genome were used to survey transcription from a diverse set of 60 tissues and cell lines using ink-jet microarrays. Further, expression activity over at least six conditions was more generally assessed using genomic tiling arrays consisting of probes tiled through a repeat-masked version of the genomic sequence making up chromosomes 20 and 22. RESULTS The combination of microarray data with extensive genome annotations resulted in a set of 28,456 experimentally supported transcripts. This set of high-confidence transcripts represents the first experimentally driven annotation of the human genome. In addition, the results from genomic tiling suggest that a large amount of transcription exists outside of annotated regions of the genome and serves as an example of how this activity could be measured on a genome-wide scale. CONCLUSIONS These data represent one of the most comprehensive assessments of transcriptional activity in the human genome and provide an atlas of human gene expression over a unique set of gene predictions. Before the annotation of the human genome is considered complete, however, the previously unannotated transcriptional activity throughout the genome must be fully characterized.
Collapse
Affiliation(s)
- Eric E Schadt
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | - Stephen W Edwards
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | | | - Dan Holder
- Merck Research Laboratories, W42-213 Sumneytown Pike, POB 4, Westpoint, PA 19846, USA
| | - Lisa Ying
- Merck Research Laboratories, W42-213 Sumneytown Pike, POB 4, Westpoint, PA 19846, USA
| | - Vladimir Svetnik
- Merck Research Laboratories, W42-213 Sumneytown Pike, POB 4, Westpoint, PA 19846, USA
| | - Amy Leonardson
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | - Kyle W Hart
- Rally Scientific, 41 Fayette Street, Suite 1, Watertown, MA 02472, USA
| | - Archie Russell
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | - Guoya Li
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | - Guy Cavet
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | - John Castle
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | - Paul McDonagh
- Amgen Inc, 1201 Amgen Court W, Seattle, WA 98119, USA
| | - Zhengyan Kan
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | - Ronghua Chen
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | - Andrew Kasarskis
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | - Mihai Margarint
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | - Ramon M Caceres
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | - Jason M Johnson
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | | | | | | | - Daniel D Shoemaker
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| |
Collapse
|
16
|
Ji J, Zhao L, Wang X, Zhou C, Ding F, Su L, Zhang C, Mao X, Wu M, Liu Z. Differential expression of S100 gene family in human esophageal squamous cell carcinoma. J Cancer Res Clin Oncol 2004; 130:480-6. [PMID: 15185146 DOI: 10.1007/s00432-004-0555-x] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2003] [Accepted: 01/28/2004] [Indexed: 10/26/2022]
Abstract
PURPOSE To study the differential expression of the S100 gene family at the RNA level in human esophageal squamous cell carcinoma (ESCC), and to find the relationship of the S100 gene family with ESCC. METHODS Firstly, the specific primers were designed for the different S100 genes with Software Primer 3, which required that both primer sequences of each S100 gene were from two different exons respectively. Then, the differential expression of 16 S100 genes was examined by semiquantitative reverse transcription-polymerase chain reaction (RT-PCR) in 62 cases of ESCC versus the corresponding normal esophageal mucosa. All RT-PCR products were analyzed by 1.5% agarose gel. With Fluor-S MultiImager and Multi-Analyst software, the electrophoresis images were evaluated with statistics analysis using SAS 8.1 software. RESULTS Eleven out of 16 S100 genes were significantly downregulated ( p<0.05) in ESCC versus the normal counterparts such as S100A1, S100A2, S100A4, S100A8, S100A9, S100A10, S100A11, S100A12, S100A14, S100B, and S100P genes. Only the S100A7 gene in the S100 family was markedly upregulated ( p<0.05). Moreover, the S100B gene was significantly correlated with histological differentiation of ESCC ( p=0.0247), and the deregulation of some S100 genes was closely correlated ( p<0.05), such as S100A10/S100A11, S100A2/S100A8, S100A2/S100A14, S100A8/S100A14, and S100A2/S100P etc. CONCLUSIONS The S100 gene family is closely associated with ESCC.
Collapse
Affiliation(s)
- Junfang Ji
- National Lab of Molecular Oncology, Cancer Institute, Chinese Academy of Medical Sciences & Peking Union Medical College, 100021 Beijing, P.R. China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Attwood TK, Miller CJ. Progress in bioinformatics and the importance of being earnest. BIOTECHNOLOGY ANNUAL REVIEW 2003; 8:1-54. [PMID: 12436914 DOI: 10.1016/s1387-2656(02)08003-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2023]
Abstract
In silico biology has gathered momentum as, worldwide, scientists have united in a common quest to sequence, store and analyse complete genomes. This year, a pivotal achievement of this cooperative endeavour was realised in the release of a public draft of the human genome, and with it the promises to improve our understanding of diverse aspects of biology and to yield a healthier future with safe personalized medicines. Key to these goals will be the need to elucidate and characterise the genes and gene products encoded not just in the human genome, but in many genomes. These tasks are underpinned by the concepts and processes of genome and gene/protein evolution, regulation of gene expression, mechanisms of protein folding, the manifestation of protein function, and so on, all of which must be understood in the context of complex, dynamic biological systems. Our use of computers to model such concepts and systems must be placed in the context of the current limits of our understanding of them:- it is important to recognise, for example, that we don't have a common understanding either of what constitutes a gene or a protein function; we can't invariably say that a particular sequence or fold has arisen via divergent or convergent evolution; and we don't fully understand the rules of protein folding. Accepting what we can't do in silico is essential in appreciating what we can do. Without this understanding, it is easy to be misled, as notions of what particular computational approaches can achieve are sometimes rather optimistic. There are valuable lessons to be learned here from the field of Artificial Intelligence, principal among which is the realisation that capturing and representing complex knowledge is time consuming, expensive and hard. Thus, we argue here that if bioinformatics is to tackle biological complexity in earnest, it would be wise to absorb the experience distilled from decades of artificial intelligence research, and to approach the road ahead with caution, rigour and pragmatism.
Collapse
Affiliation(s)
- T K Attwood
- School of Biological Sciences, Department of Computer Science, University of Manchester, Oxford Road, Manchester M13 9PT, UK.
| | | |
Collapse
|
18
|
Tan JMM, Tock EPC, Chow VTK. The novel human MOST-1 (C8orf17) gene exhibits tissue specific expression, maps to chromosome 8q24.2, and is overexpressed/amplified in high grade cancers of the breast and prostate. Mol Pathol 2003; 56:109-15. [PMID: 12665628 PMCID: PMC1187302 DOI: 10.1136/mp.56.2.109] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
AIMS To elucidate genes that participate in the process of oncogenesis, primers based on the E6 genes of genital human papillomaviruses (HPVs) were used to amplify potential expressed sequence tags (ESTs) from the MOLT-4 T lymphoblastic leukaemia cell line. METHODS Using the polymerase chain reaction (PCR) with human papillomavirus E6 gene primers, an EST from the MOLT-4 T lymphoblastic leukaemia cell line was amplified. Via rapid amplification of cDNA ends (RACE) and cycle sequencing from MOLT-4 and fetal lung cDNA libraries, overlapping cDNAs of 2786 bp and 2054 bp of the corresponding novel human intronless gene designated MOST-1 (for MOLT-4 sequence tag-1) were characterised and assigned the symbol C8orf17 by the HUGO Nomenclature Committee. RESULTS Both cDNAs contained a potential open reading frame (ORF) of 297 bp incorporating a methionine codon with an ideal Kozak consensus sequence for translation initiation, and encoding a putative hydrophilic polypeptide of 99 amino acids. Although reverse transcription PCR (RT-PCR) demonstrated MOST-1 expression in all 19 cancer and two normal cell lines tested, differential expression was seen in only nine of 16 normal tissues tested (heart, kidney, liver, pancreas, small intestine, ovary, testis, prostate, and thymus). A 388 bp fragment was amplified from the NS-1 mouse myeloma cell line, the sequence of which was identical to that within the MOST-1 ORF. The MOST-1 gene was mapped by fluorescent in situ hybridisation to chromosome 8q24.2, a region amplified in many breast cancers and prostate cancers, which is also the candidate site of potential oncogene(s) other than c-myc located at 8q24.1. Analysis of paired biopsies of invasive ductal breast cancer and adjacent normal tissue by semiquantitative and real time RT-PCR revealed average tumour to normal ratios of MOST-1 expression that were two times greater in grade 3 cancers than in grade 1 and 2 cancers. Quantitative real time PCR of archival prostatic biopsies displayed MOST-1 DNA values that were 9.9, 7.5, 4.2, and 1.4 times higher in high grade carcinomas, intermediate grade carcinomas, low grade carcinomas, and benign hyperplasias, respectively, than in normal samples. CONCLUSIONS These data suggest a role for MOST-1 in cellular differentiation, proliferation, and carcinogenesis.
Collapse
Affiliation(s)
- J M M Tan
- Human Genome Laboratory, Department of Microbiology, Faculty of Medicine, National University of Singapore, Kent Ridge 117597, Singapore
| | | | | |
Collapse
|
19
|
Collins JE, Goward ME, Cole CG, Smink LJ, Huckle EJ, Knowles S, Bye JM, Beare DM, Dunham I. Reevaluating human gene annotation: a second-generation analysis of chromosome 22. Genome Res 2003; 13:27-36. [PMID: 12529303 PMCID: PMC430954 DOI: 10.1101/gr.695703] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
We report a second-generation gene annotation of human chromosome 22. Using expressed sequence databases, comparative sequence analysis, and experimental verification, we have extended genes, fused previously fragmented structures, and identified new genes. The total length in exons of annotation was increased by 74% over our previously published annotation and includes 546 protein-coding genes and 234 pseudogenes. Thirty-two potential protein-coding annotations are partial copies of other genes, and may represent duplications on an evolutionary path to change or loss of function. We also identified 31 non-protein-coding transcripts, including 16 possible antisense RNAs. By extrapolation, we estimate the human genome contains 29,000-36,000 protein-coding genes, 21,300 pseudogenes, and 1500 antisense RNAs. We suggest that our revised annotation criteria provide a paradigm for future annotation of the human genome.
Collapse
Affiliation(s)
- John E Collins
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Kochiwa H, Suzuki R, Washio T, Saito R, Bono H, Carninci P, Okazaki Y, Miki R, Hayashizaki Y, Tomita M. Inferring alternative splicing patterns in mouse from a full-length cDNA library and microarray data. Genome Res 2002; 12:1286-93. [PMID: 12176936 PMCID: PMC186638 DOI: 10.1101/gr.220302] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Although many studies on alternative splicing of specific genes have been reported in the literature, the general mechanism that regulates alternative splicing has not been clearly understood. In this study, we systematically aligned each pair of the 21,076 cDNA sequences of Mus musculus, searched for putative alternative splicing patterns, and constructed a list of potential alternative splicing sites. Two cDNAs are suspected to be alternatively spliced and originating from a common gene if they share most of their region with a high degree of sequence homology, but parts of the sequences are very distinctive or deleted in either cDNA. The list contains the following information: (1) tissue, (2) developmental stage, (3) sequences around splice sites, (4) the length of each gapped region, and (5) other comments. The list is available at http://www.bioinfo.sfc.keio.ac.jp/intron. Our results have predicted a number of unreported alternatively spliced genes, some of which are expressed only in a specific tissue or at a specific developmental stage.
Collapse
Affiliation(s)
- Hiromi Kochiwa
- Graduate School of Media and Governance, Keio University, Fujisawa, Kanagawa 252-8520, Japan
| | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Harrison PM, Gerstein M. Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. J Mol Biol 2002; 318:1155-74. [PMID: 12083509 DOI: 10.1016/s0022-2836(02)00109-2] [Citation(s) in RCA: 145] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Protein families can be used to understand many aspects of genomes, both their "live" and their "dead" parts (i.e. genes and pseudogenes). Surveys of genomes have revealed that, in every organism, there are always a few large families and many small ones, with the overall distribution following a power-law. This commonality is equally true for both genes and pseudogenes, and exists despite the fact that the specific families that are enlarged differ greatly between organisms. Furthermore, because of family structure there is great redundancy in proteomes, a fact linked to the large number of dispensable genes for each organism and the small size of the minimal, indispensable sub-proteome. Pseudogenes in prokaryotes represent families that are in the process of being dispensed with. In particular, the genome sequences of certain pathogenic bacteria (Mycobacterium leprae, Yersinia pestis and Rickettsia prowazekii) show how an organism can undergo reductive evolution on a large scale (i.e. the dying out of families) as a result of niche change. There appears to be less pressure to delete pseudogenes in eukaryotes. These can be divided into two varieties, duplicated and processed, where the latter involves reverse transcription from an mRNA intermediate. We discuss these collectively in yeast, worm, fly, and human. The fly has few pseudogenes apparently because of its high rate of genomic DNA deletion. In the other three organisms, the distribution of pseudogenes on the chromosome and amongst different families is highly non-uniform. Pseudogenes tend not to occur in the middle of chromosome arms, and tend to be associated with lineage-specific (as opposed to highly conserved) families that have environmental-response functions. This may be because, rather than being dead, they may form a reservoir of diverse "extra parts" that can be resurrected to help an organism adapt to its surroundings. In yeast, there may be a novel mechanism involving the [PSI+] prion that potentially enables this resurrection. In worm, the pseudogenes tend to arise out of families (e.g. chemoreceptors) that are greatly expanded in it compared to the fly. The human genome stands out in having many processed pseudogenes. These have a character very different from those of the duplicated variety, to a large extent just representing random insertions. Thus, their occurrence tends to be roughly in proportion to the amount of mRNA for a particular protein and to reflect the extent of the intergenic sequences. Further information about pseudogenes is available at http://genecensus.org/pseudogene
Collapse
Affiliation(s)
- Paul M Harrison
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520-8114, USA
| | | |
Collapse
|
22
|
Xie H, Wasserman A, Levine Z, Novik A, Grebinskiy V, Shoshan A, Mintz L. Large-scale protein annotation through gene ontology. Genome Res 2002; 12:785-94. [PMID: 11997345 PMCID: PMC186564 DOI: 10.1101/gr.86902] [Citation(s) in RCA: 79] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Recent progress in genomic sequencing, computational biology, and ontology development has presented an opportunity to investigate biological systems from a unique perspective, that is, examining genomes and transcriptomes through the multiple and hierarchical structure of Gene Ontology (GO). We report here our development of GO Engine, a computational platform for GO annotation, and analysis of the resultant GO annotations of human proteins. Protein annotation was centered on sequence homology with GO-annotated proteins and protein domain analysis. Text information analysis and a multiparameter cellular localization predictive tool were also used to increase the annotation accuracy, and to predict novel annotations. The majority of proteins corresponding to full-length mRNA in GenBank, and the majority of proteins in the NR database (nonredundant database of proteins) were annotated with one or more GO nodes in each of the three GO categories. The annotations of GenBank and SWISS-PROT proteins are available to the public at the GO Consortium web site.
Collapse
Affiliation(s)
- Hanqing Xie
- Compugen Inc., Jamesburg, New Jersey 08831, USA.
| | | | | | | | | | | | | |
Collapse
|
23
|
Castresana J. Genes on human chromosome 19 show extreme divergence from the mouse orthologs and a high GC content. Nucleic Acids Res 2002; 30:1751-6. [PMID: 11937628 PMCID: PMC113201 DOI: 10.1093/nar/30.8.1751] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Mutational rates are known to be variable along the mammalian genome but the extent of this non-random fluctuation and their causes are less well understood. Using 5509 human and mouse orthologous genes with known chromosome positions, it is shown here that there are extreme differences in synonymous evolutionary rates between different human chromosomes when distances are measured using maximum-likelihood techniques. In particular, the average synonymous rate of genes located in human chromosome 19 is extremely high (K(s) = 1.243 substitutions/site) compared with the average of all genes (K(s) = 0.729), and significantly different from all other human chromosomes. When genes are sorted according to mouse chromosomes no such large differences are found. Strikingly, almost all genes of human chromosome 19 have very high GC content in humans but not in the mouse orthologs. More generally, correlation analysis shows that genes with very high GC content in humans have experienced the highest synonymous divergencies from the mouse. It is likely that, in such genes, the known relaxation of the isochore structure in rodents has caused an increased accumulation of synonymous substitutions in the mouse lineage, whereas the regions with the highest GC content in the human genome are accordingly maintained by a strong selective pressure.
Collapse
Affiliation(s)
- Jose Castresana
- European Molecular Biology Laboratory (EMBL), Biocomputing Unit, Meyerhofstrasse 1, D-69117 Heidelberg, Germany.
| |
Collapse
|
24
|
Riechmann JL. Transcriptional regulation: a genomic overview. THE ARABIDOPSIS BOOK 2002; 1:e0085. [PMID: 22303220 PMCID: PMC3243377 DOI: 10.1199/tab.0085] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
The availability of the Arabidopsis thaliana genome sequence allows a comprehensive analysis of transcriptional regulation in plants using novel genomic approaches and methodologies. Such a genomic view of transcription first necessitates the compilation of lists of elements. Transcription factors are the most numerous of the different types of proteins involved in transcription in eukaryotes, and the Arabidopsis genome codes for more than 1,500 of them, or approximately 6% of its total number of genes. A genome-wide comparison of transcription factors across the three eukaryotic kingdoms reveals the evolutionary generation of diversity in the components of the regulatory machinery of transcription. However, as illustrated by Arabidopsis, transcription in plants follows similar basic principles and logic to those in animals and fungi. A global view and understanding of transcription at a cellular and organismal level requires the characterization of the Arabidopsis transcriptome and promoterome, as well as of the interactome, the localizome, and the phenome of the proteins involved in transcription.
Collapse
Affiliation(s)
- José Luis Riechmann
- Mendel Biotechnology, 21375 Cabot Blvd., Hayward, CA 94545, USA
- California Institute of Technology, Division of Biology 156-29, Pasadena, CA 91125
| |
Collapse
|
25
|
Lipovich L, Hughes AL, King MC, Abkowitz JL, Quigley JG. Genomic structure and evolutionary context of the human feline leukemia virus subgroup C receptor (hFLVCR) gene: evidence for block duplications and de novo gene formation within duplicons of the hFLVCR locus. Gene 2002; 286:203-13. [PMID: 11943475 DOI: 10.1016/s0378-1119(02)00457-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In this paper we sought to analyze the genomic structure and context of human feline leukemia virus subgroup C receptor (hFLVCR), a human glucarate transporter-like gene at chromosome 1q31, and compare it to that of a paralog (FLVCR14q) at chromosome 14q24. Splicing, polyadenylation, and expression patterns, as estimated by in silico analysis, differed between the two FLVCR genes despite their similar genomic structures, suggesting active and independent evolution of transcriptional and messenger RNA processing patterns after gene duplication. Promoter activity was bi-directional for hFLVCR, but not for its 14q paralog. The upstream 1q transcribed sequences were determined to comprise a novel gene of unknown function, LQK1. Annotation of contigs centered at hFLVCR and FLVCRL14q also revealed highly conserved gene clusters on chromosomes 1 and 14, inferred to result from a duplication. The clusters contained members of the FLVCR, Angel (KIAA0759), JDP, p21SNFT, and TGF- families, as well as two uncharacterized families. The genome-wide locations of both previously recognized and four de novo in silico predicted genes belonging to these seven families were determined. Phylogenetic analyses of these families were consistent with the hypothesis that the 1q/14q duplication occurred early within, or immediately prior to the vertebrate divergence, after the protostome-deuterostome divergence but before the amniote-amphibian divergence.
Collapse
MESH Headings
- 3' Untranslated Regions/genetics
- Alternative Splicing
- Animals
- Cats
- Chromosomes, Human, Pair 1/genetics
- Chromosomes, Human, Pair 14/genetics
- DNA, Complementary/chemistry
- DNA, Complementary/genetics
- Evolution, Molecular
- Gene Duplication
- Genes/genetics
- Humans
- Molecular Sequence Data
- Phylogeny
- Poly A/genetics
- Promoter Regions, Genetic/genetics
- Receptors, Virus/genetics
- Sequence Analysis, DNA
- Time Factors
- Transcription Initiation Site
Collapse
Affiliation(s)
- Leonard Lipovich
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195-7710, USA
| | | | | | | | | |
Collapse
|
26
|
Abstract
Bioinformatics is an art and science concerned with the use of computing in biological research areas such as genomics, transcriptomics, proteomics, genetics, and evolution. This review paints a broad picture of bioinformatics, drawing examples from genomic sequencing and microarray analysis. I highlight the role of bioinformatics at multiple points along the path from high-tech data generation to biological discovery.
Collapse
|
27
|
Harrison PM, Hegyi H, Balasubramanian S, Luscombe NM, Bertone P, Echols N, Johnson T, Gerstein M. Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22. Genome Res 2002; 12:272-80. [PMID: 11827946 PMCID: PMC155275 DOI: 10.1101/gr.207102] [Citation(s) in RCA: 151] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
We have developed an initial approach for annotating and surveying pseudogenes in the human genome. We search human genomic DNA for regions that are similar to known protein sequences and contain obvious disablements (i.e., mid-sequence stop codons or frameshifts), while ensuring minimal overlap with annotations of known genes. Pseudogenes can be divided into "processed" and "nonprocessed"; the former are reverse transcribed from mRNA (and therefore have no intron structure), whereas the latter presumably arise from genomic duplications. We annotate putative processed pseudogenes based on whether there is a continuous span of homology that is >70% of the length of the closest matching human protein (i.e., with introns removed), or whether there is evidence of polyadenylation. We have applied our approach to chromosomes 21 and 22, the first parts of the human genome completely sequenced, finding 190 new pseudogene annotations beyond the 264 reported by the sequencing centers. In total, on chromosomes 21 and 22, there are 189 processed pseudogenes, 195 nonprocessed pseudogenes, and, additionally, 70 pseudogenic immunoglobulin gene segments. (Detailed assignments are available at http://bioinfo.mbb.yale.edu/genome/pseudogene or http://genecensus.org/pseudogene.) By extrapolation, we predict that there could be up to approximately 20,000 pseudogenes in the whole human genome, with a little more than half of them processed. We have determined the main populations and clusters of pseudogenes on chromosomes 21 and 22. There are notable excesses of pseudogenes relative to genes near the centromeres of both chromosomes, indicating the existence of pseudogenic "hot-spots" in the genome. We have looked at the distribution of InterPro families and Gene Ontology (GO) functional categories in our pseudogenes. Overall, the families in both processed and nonprocessed pseudogene populations occur according to a similar power-law distribution as that found for the occurrence of gene families, with a few big families and many small ones. The processed population is, in particular, enriched in highly expressed ribosomal-protein sequences (approximately 20%), which appear fairly evenly distributed across the chromosomes. We compared processed pseudogenes of different evolutionary ages, observing a high degree of similarity between "ancient" and "modern" subpopulations. This may be attributable to the consistently high expression of ribosomal proteins over evolutionary time. Finally, we find that chromosome 22 pseudogene population is dominated by immunoglobulin segments, which have a greater rate of disablement per amino acid than the other pseudogene populations and are also substantially more diverged.
Collapse
MESH Headings
- Chromosome Mapping/methods
- Chromosomes, Human, Pair 21/genetics
- Chromosomes, Human, Pair 22/genetics
- Evolution, Molecular
- Fossils
- Genes, Immunoglobulin
- Genes, Overlapping
- Genome, Human
- Humans
- Multigene Family
- Pseudogenes
- RNA Processing, Post-Transcriptional/genetics
- Sequence Analysis, DNA/statistics & numerical data
Collapse
Affiliation(s)
- Paul M Harrison
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520-8114, USA
| | | | | | | | | | | | | | | |
Collapse
|
28
|
Assaad FF. Of weeds and men: what genomes teach us about plant cell biology. CURRENT OPINION IN PLANT BIOLOGY 2001; 4:478-487. [PMID: 11641062 DOI: 10.1016/s1369-5266(00)00204-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
It has generally been assumed that fundamental cellular processes are conserved at the molecular level. Genome comparisons, however, suggest that the molecular mechanisms underlying programmed cell death, defense, adaptation and development may differ considerably between the plant and animal kingdoms. Phylogenetic analyses have revealed a great deal of novelty in the plant genes that are implicated in conserved processes such as transcription, cytoskeletal dynamics and vesicle trafficking. The Arabidopsis genome highlights the highly dynamic and regulated nature of the plant cell, which is fine-tuned to light, water, nutrient availability, temperature, touch and wind.
Collapse
Affiliation(s)
- F F Assaad
- Genetics and Microbiology Institute, Ludwig Maximillian University of Münich, Maria Ward Str. 1a, 80638, Münich, Germany.
| |
Collapse
|
29
|
Zhou G, Chen J, Lee S, Clark T, Rowley JD, Wang SM. The pattern of gene expression in human CD34(+) stem/progenitor cells. Proc Natl Acad Sci U S A 2001; 98:13966-71. [PMID: 11717454 PMCID: PMC61150 DOI: 10.1073/pnas.241526198] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/04/2001] [Indexed: 11/18/2022] Open
Abstract
We have analyzed the pattern of gene expression in human primary CD34(+) stem/progenitor cells. We identified 42,399 unique serial analysis of gene expression (SAGE) tags among 106,021 SAGE tags collected from 2.5 x 10(6) CD34(+) cells purified from bone marrow. Of these unique SAGE tags, 21,546 matched known expressed sequences, including 3,687 known genes, and 20,854 were novel without a match. The SAGE tags that matched known sequences tended to be at higher levels, whereas the novel SAGE tags tended to be at lower levels. By using the generation of longer sequences from SAGE tags for gene identification (GLGI) method, we identified the correct gene for 385 of 440 high-copy SAGE tags that matched multiple genes and we generated 198 novel 3' expressed sequence tags from 138 high-copy novel SAGE tags. We observed that many different SAGE tags were derived from the same genes, reflecting the high heterogeneity of the 3' untranslated region in the expressed genes. We compared the quantitative relationship for genes known to be important in hematopoiesis. The qualitative identification and quantitative measure for each known gene, expressed sequence tag, and novel SAGE tag provide a base for studying normal gene expression in hematopoietic stem/progenitor cells and for studying abnormal gene expression in hematopoietic diseases.
Collapse
Affiliation(s)
- G Zhou
- Department of Medicine, University of Chicago Medical Center, 5841 South Maryland Avenue, MC2115, Chicago, IL 60637, USA
| | | | | | | | | | | |
Collapse
|
30
|
Mattick JS. Non-coding RNAs: the architects of eukaryotic complexity. EMBO Rep 2001; 2:986-91. [PMID: 11713189 PMCID: PMC1084129 DOI: 10.1093/embo-reports/kve230] [Citation(s) in RCA: 536] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2001] [Revised: 09/10/2001] [Accepted: 09/11/2001] [Indexed: 11/14/2022] Open
Abstract
Around 98% of all transcriptional output in humans is non-coding RNA. RNA-mediated gene regulation is widespread in higher eukaryotes and complex genetic phenomena like RNA interference, co-suppression, transgene silencing, imprinting, methylation, and possibly position-effect variegation and transvection, all involve intersecting pathways based on or connected to RNA signaling. I suggest that the central dogma is incomplete, and that intronic and other non-coding RNAs have evolved to comprise a second tier of gene expression in eukaryotes, which enables the integration and networking of complex suites of gene activity. Although proteins are the fundamental effectors of cellular function, the basis of eukaryotic complexity and phenotypic variation may lie primarily in a control architecture composed of a highly parallel system of trans-acting RNAs that relay state information required for the coordination and modulation of gene expression, via chromatin remodeling, RNA-DNA, RNA-RNA and RNA-protein interactions. This system has interesting and perhaps informative analogies with small world networks and dataflow computing.
Collapse
Affiliation(s)
- J S Mattick
- ARC Special Research Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, University of Queensland, Brisbane 4072, Australia.
| |
Collapse
|