Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Wright FA, Lemon WJ, Zhao WD, Sears R, Zhuo D, Wang JP, Yang HY, Baer T, Stredney D, Spitzner J, Stutz A, Krahe R, Yuan B. A draft annotation and overview of the human genome. Genome Biol 2001;2:RESEARCH0025. [PMID: 11516338 PMCID: PMC55322 DOI: 10.1186/gb-2001-2-7-research0025] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2001] [Revised: 04/04/2001] [Accepted: 06/01/2001] [Indexed: 11/28/2022] Open

For:	Wright FA, Lemon WJ, Zhao WD, Sears R, Zhuo D, Wang JP, Yang HY, Baer T, Stredney D, Spitzner J, Stutz A, Krahe R, Yuan B. A draft annotation and overview of the human genome. Genome Biol 2001;2:RESEARCH0025. [PMID: 11516338 PMCID: PMC55322 DOI: 10.1186/gb-2001-2-7-research0025] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2001] [Revised: 04/04/2001] [Accepted: 06/01/2001] [Indexed: 11/28/2022] Open

Number

Cited by Other Article(s)

Rzeszutek I, Singh A. Small RNAs, Big Diseases. Int J Mol Sci 2020;21:E5699. [PMID: 32784829 PMCID: PMC7460979 DOI: 10.3390/ijms21165699] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 08/06/2020] [Accepted: 08/08/2020] [Indexed: 02/06/2023] Open

Campbell MJ. Tales from topographic oceans: topologically associated domains and cancer. Endocr Relat Cancer 2019;26:R611-R626. [PMID: 31505466 PMCID: PMC7664306 DOI: 10.1530/erc-19-0348] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 09/09/2019] [Indexed: 01/03/2023]

Kingan SB, Urban J, Lambert CC, Baybayan P, Childers AK, Coates B, Scheffler B, Hackett K, Korlach J, Geib SM. A high-quality genome assembly from a single, field-collected spotted lanternfly (Lycorma delicatula) using the PacBio Sequel II system. Gigascience 2019;8:giz122. [PMID: 31609423 PMCID: PMC6791401 DOI: 10.1093/gigascience/giz122] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Revised: 08/08/2019] [Accepted: 09/17/2019] [Indexed: 12/20/2022] Open

Abstract

BACKGROUND

A high-quality reference genome is an essential tool for applied and basic research on arthropods. Long-read sequencing technologies may be used to generate more complete and contiguous genome assemblies than alternate technologies; however, long-read methods have historically had greater input DNA requirements and higher costs than next-generation sequencing, which are barriers to their use on many samples. Here, we present a 2.3 Gb de novo genome assembly of a field-collected adult female spotted lanternfly (Lycorma delicatula) using a single Pacific Biosciences SMRT Cell. The spotted lanternfly is an invasive species recently discovered in the northeastern United States that threatens to damage economically important crop plants in the region.

RESULTS

The DNA from 1 individual was used to make 1 standard, size-selected library with an average DNA fragment size of ∼20 kb. The library was run on 1 Sequel II SMRT Cell 8M, generating a total of 132 Gb of long-read sequences, of which 82 Gb were from unique library molecules, representing ∼36× coverage of the genome. The assembly had high contiguity (contig N50 length = 1.5 Mb), completeness, and sequence level accuracy as estimated by conserved gene set analysis (96.8% of conserved genes both complete and without frame shift errors). Furthermore, it was possible to segregate more than half of the diploid genome into the 2 separate haplotypes. The assembly also recovered 2 microbial symbiont genomes known to be associated with L. delicatula, each microbial genome being assembled into a single contig.

CONCLUSIONS

We demonstrate that field-collected arthropods can be used for the rapid generation of high-quality genome assemblies, an attractive approach for projects on emerging invasive species, disease vectors, or conservation efforts of endangered species.

Collapse

Abascal F, Juan D, Jungreis I, Kellis M, Martinez L, Rigau M, Rodriguez JM, Vazquez J, Tress ML. Loose ends: almost one in five human genes still have unresolved coding status. Nucleic Acids Res 2019;46:7070-7084. [PMID: 29982784 PMCID: PMC6101605 DOI: 10.1093/nar/gky587] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 06/18/2018] [Indexed: 12/16/2022] Open

Rich J, Ogryzko VV, Pirozhkova IV. Satellite DNA and related diseases. ACTA ACUST UNITED AC 2014. [DOI: 10.7124/bc.00089e] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Dunham I, Beare DM, Collins JE. The characteristics of human genes: analysis of human chromosome 22. Comp Funct Genomics 2010;4:635-46. [PMID: 18629020 PMCID: PMC2447302 DOI: 10.1002/cfg.335] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2003] [Revised: 09/04/2003] [Accepted: 09/08/2003] [Indexed: 11/11/2022] Open

In-tube transfection improves the efficiency of gene transfer in primary neuronal cultures. J Neurosci Methods 2008;177:348-54. [PMID: 19014969 DOI: 10.1016/j.jneumeth.2008.10.023] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2008] [Revised: 10/14/2008] [Accepted: 10/15/2008] [Indexed: 11/20/2022]

Meagher RB, Kandasamy MK, McKinney EC. Multicellular development and protein-protein interactions. PLANT SIGNALING & BEHAVIOR 2008;3:333-6. [PMID: 19841663 PMCID: PMC2634275 DOI: 10.4161/psb.3.5.5343] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2007] [Accepted: 11/28/2007] [Indexed: 05/20/2023]

Levitsky VG, Ignatieva EV, Ananko EA, Turnaev II, Merkulova TI, Kolchanov NA, Hodgman TC. Effective transcription factor binding site prediction using a combination of optimization, a genetic algorithm and discriminant analysis to capture distant interactions. BMC Bioinformatics 2007;8:481. [PMID: 18093302 PMCID: PMC2265442 DOI: 10.1186/1471-2105-8-481] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2007] [Accepted: 12/19/2007] [Indexed: 12/22/2022] Open

Abstract

Background

Reliable transcription factor binding site (TFBS) prediction methods are essential for computer annotation of large amount of genome sequence data. However, current methods to predict TFBSs are hampered by the high false-positive rates that occur when only sequence conservation at the core binding-sites is considered.

Results

To improve this situation, we have quantified the performance of several Position Weight Matrix (PWM) algorithms, using exhaustive approaches to find their optimal length and position. We applied these approaches to bio-medically important TFBSs involved in the regulation of cell growth and proliferation as well as in inflammatory, immune, and antiviral responses (NF-κB, ISGF3, IRF1, STAT1), obesity and lipid metabolism (PPAR, SREBP, HNF4), regulation of the steroidogenic (SF-1) and cell cycle (E2F) genes expression. We have also gained extra specificity using a method, entitled SiteGA, which takes into account structural interactions within TFBS core and flanking regions, using a genetic algorithm (GA) with a discriminant function of locally positioned dinucleotide (LPD) frequencies.

To ensure a higher confidence in our approach, we applied resampling-jackknife and bootstrap tests for the comparison, it appears that, optimized PWM and SiteGA have shown similar recognition performances. Then we applied SiteGA and optimized PWMs (both separately and together) to sequences in the Eukaryotic Promoter Database (EPD). The resulting SiteGA recognition models can now be used to search sequences for BSs using the web tool, SiteGA.

Analysis of dependencies between close and distant LPDs revealed by SiteGA models has shown that the most significant correlations are between close LPDs, and are generally located in the core (footprint) region. A greater number of less significant correlations are mainly between distant LPDs, which spanned both core and flanking regions. When SiteGA and optimized PWM models were applied together, this substantially reduced false positives at least at higher stringencies.

Conclusion

Based on this analysis, SiteGA adds substantial specificity even to optimized PWMs and may be considered for large-scale genome analysis. It adds to the range of techniques available for TFBS prediction, and EPD analysis has led to a list of genes which appear to be regulated by the above TFs.

Collapse

Hillgenberg M, Hofmann C, Stadler H, Löser P. High-efficiency system for the construction of adenovirus vectors and its application to the generation of representative adenovirus-based cDNA expression libraries. J Virol 2007;80:5435-50. [PMID: 16699024 PMCID: PMC1472155 DOI: 10.1128/jvi.00218-06] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Kowalska A, Bozsaky E, Ramsauer T, Rieder D, Bindea G, Lörch T, Trajanoski Z, Ambros PF. A new platform linking chromosomal and sequence information. Chromosome Res 2007;15:327-39. [PMID: 17406992 DOI: 10.1007/s10577-007-1129-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2006] [Revised: 01/24/2007] [Accepted: 01/24/2007] [Indexed: 10/23/2022]

Ganova-Raeva L, Zhang X, Cao F, Fields H, Khudyakov Y. Primer Extension Enrichment Reaction (PEER): a new subtraction method for identification of genetic differences between biological specimens. Nucleic Acids Res 2006;34:e76. [PMID: 16790564 PMCID: PMC1484250 DOI: 10.1093/nar/gkl391] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2006] [Revised: 04/20/2006] [Accepted: 05/08/2006] [Indexed: 11/14/2022] Open

Kaminsky ZA, Popendikyte V, Assadzadeh A, Petronis A. Search for somatic DNA variation in the brain: investigation of the serotonin 2A receptor gene. Mamm Genome 2005;16:587-93. [PMID: 16180140 DOI: 10.1007/s00335-005-0040-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2005] [Accepted: 05/05/2005] [Indexed: 01/05/2023]

Fernandez-Fuentes N, Hermoso A, Espadaler J, Querol E, Aviles FX, Oliva B. Classification of common functional loops of kinase super-families. Proteins 2004;56:539-55. [PMID: 15229886 DOI: 10.1002/prot.20136] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Schadt EE, Edwards SW, GuhaThakurta D, Holder D, Ying L, Svetnik V, Leonardson A, Hart KW, Russell A, Li G, Cavet G, Castle J, McDonagh P, Kan Z, Chen R, Kasarskis A, Margarint M, Caceres RM, Johnson JM, Armour CD, Garrett-Engele PW, Tsinoremas NF, Shoemaker DD. A comprehensive transcript index of the human genome generated using microarrays and computational approaches. Genome Biol 2004;5:R73. [PMID: 15461792 PMCID: PMC545593 DOI: 10.1186/gb-2004-5-10-r73] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2004] [Revised: 07/07/2004] [Accepted: 08/16/2004] [Indexed: 12/13/2022] Open

Affiliation(s)

Eric E Schadt Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
Stephen W Edwards Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
Debraj GuhaThakurta Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
Dan Holder Merck Research Laboratories, W42-213 Sumneytown Pike, POB 4, Westpoint, PA 19846, USA
Lisa Ying Merck Research Laboratories, W42-213 Sumneytown Pike, POB 4, Westpoint, PA 19846, USA
Vladimir Svetnik Merck Research Laboratories, W42-213 Sumneytown Pike, POB 4, Westpoint, PA 19846, USA
Amy Leonardson Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
Kyle W Hart Rally Scientific, 41 Fayette Street, Suite 1, Watertown, MA 02472, USA
Archie Russell Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
Guoya Li Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
Guy Cavet Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
John Castle Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
Paul McDonagh Amgen Inc, 1201 Amgen Court W, Seattle, WA 98119, USA
Zhengyan Kan Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
Ronghua Chen Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
Andrew Kasarskis Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
Mihai Margarint Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
Ramon M Caceres Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
Jason M Johnson Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
Christopher D Armour Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
Philip W Garrett-Engele Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
Nicholas F Tsinoremas The Scripps Research Institute, Jupiter, FL 33458, USA
Daniel D Shoemaker Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA

Collapse

Ji J, Zhao L, Wang X, Zhou C, Ding F, Su L, Zhang C, Mao X, Wu M, Liu Z. Differential expression of S100 gene family in human esophageal squamous cell carcinoma. J Cancer Res Clin Oncol 2004;130:480-6. [PMID: 15185146 DOI: 10.1007/s00432-004-0555-x] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2003] [Accepted: 01/28/2004] [Indexed: 10/26/2022]

Attwood TK, Miller CJ. Progress in bioinformatics and the importance of being earnest. BIOTECHNOLOGY ANNUAL REVIEW 2003;8:1-54. [PMID: 12436914 DOI: 10.1016/s1387-2656(02)08003-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2023]

Abstract

In silico biology has gathered momentum as, worldwide, scientists have united in a common quest to sequence, store and analyse complete genomes. This year, a pivotal achievement of this cooperative endeavour was realised in the release of a public draft of the human genome, and with it the promises to improve our understanding of diverse aspects of biology and to yield a healthier future with safe personalized medicines. Key to these goals will be the need to elucidate and characterise the genes and gene products encoded not just in the human genome, but in many genomes. These tasks are underpinned by the concepts and processes of genome and gene/protein evolution, regulation of gene expression, mechanisms of protein folding, the manifestation of protein function, and so on, all of which must be understood in the context of complex, dynamic biological systems. Our use of computers to model such concepts and systems must be placed in the context of the current limits of our understanding of them:- it is important to recognise, for example, that we don't have a common understanding either of what constitutes a gene or a protein function; we can't invariably say that a particular sequence or fold has arisen via divergent or convergent evolution; and we don't fully understand the rules of protein folding. Accepting what we can't do in silico is essential in appreciating what we can do. Without this understanding, it is easy to be misled, as notions of what particular computational approaches can achieve are sometimes rather optimistic. There are valuable lessons to be learned here from the field of Artificial Intelligence, principal among which is the realisation that capturing and representing complex knowledge is time consuming, expensive and hard. Thus, we argue here that if bioinformatics is to tackle biological complexity in earnest, it would be wise to absorb the experience distilled from decades of artificial intelligence research, and to approach the road ahead with caution, rigour and pragmatism.

Collapse

Tan JMM, Tock EPC, Chow VTK. The novel human MOST-1 (C8orf17) gene exhibits tissue specific expression, maps to chromosome 8q24.2, and is overexpressed/amplified in high grade cancers of the breast and prostate. Mol Pathol 2003;56:109-15. [PMID: 12665628 PMCID: PMC1187302 DOI: 10.1136/mp.56.2.109] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]

Abstract

AIMS

To elucidate genes that participate in the process of oncogenesis, primers based on the E6 genes of genital human papillomaviruses (HPVs) were used to amplify potential expressed sequence tags (ESTs) from the MOLT-4 T lymphoblastic leukaemia cell line.

METHODS

Using the polymerase chain reaction (PCR) with human papillomavirus E6 gene primers, an EST from the MOLT-4 T lymphoblastic leukaemia cell line was amplified. Via rapid amplification of cDNA ends (RACE) and cycle sequencing from MOLT-4 and fetal lung cDNA libraries, overlapping cDNAs of 2786 bp and 2054 bp of the corresponding novel human intronless gene designated MOST-1 (for MOLT-4 sequence tag-1) were characterised and assigned the symbol C8orf17 by the HUGO Nomenclature Committee.

RESULTS

Both cDNAs contained a potential open reading frame (ORF) of 297 bp incorporating a methionine codon with an ideal Kozak consensus sequence for translation initiation, and encoding a putative hydrophilic polypeptide of 99 amino acids. Although reverse transcription PCR (RT-PCR) demonstrated MOST-1 expression in all 19 cancer and two normal cell lines tested, differential expression was seen in only nine of 16 normal tissues tested (heart, kidney, liver, pancreas, small intestine, ovary, testis, prostate, and thymus). A 388 bp fragment was amplified from the NS-1 mouse myeloma cell line, the sequence of which was identical to that within the MOST-1 ORF. The MOST-1 gene was mapped by fluorescent in situ hybridisation to chromosome 8q24.2, a region amplified in many breast cancers and prostate cancers, which is also the candidate site of potential oncogene(s) other than c-myc located at 8q24.1. Analysis of paired biopsies of invasive ductal breast cancer and adjacent normal tissue by semiquantitative and real time RT-PCR revealed average tumour to normal ratios of MOST-1 expression that were two times greater in grade 3 cancers than in grade 1 and 2 cancers. Quantitative real time PCR of archival prostatic biopsies displayed MOST-1 DNA values that were 9.9, 7.5, 4.2, and 1.4 times higher in high grade carcinomas, intermediate grade carcinomas, low grade carcinomas, and benign hyperplasias, respectively, than in normal samples.

CONCLUSIONS

These data suggest a role for MOST-1 in cellular differentiation, proliferation, and carcinogenesis.

Collapse

Collins JE, Goward ME, Cole CG, Smink LJ, Huckle EJ, Knowles S, Bye JM, Beare DM, Dunham I. Reevaluating human gene annotation: a second-generation analysis of chromosome 22. Genome Res 2003;13:27-36. [PMID: 12529303 PMCID: PMC430954 DOI: 10.1101/gr.695703] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Kochiwa H, Suzuki R, Washio T, Saito R, Bono H, Carninci P, Okazaki Y, Miki R, Hayashizaki Y, Tomita M. Inferring alternative splicing patterns in mouse from a full-length cDNA library and microarray data. Genome Res 2002;12:1286-93. [PMID: 12176936 PMCID: PMC186638 DOI: 10.1101/gr.220302] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Harrison PM, Gerstein M. Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. J Mol Biol 2002;318:1155-74. [PMID: 12083509 DOI: 10.1016/s0022-2836(02)00109-2] [Citation(s) in RCA: 145] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Abstract

Protein families can be used to understand many aspects of genomes, both their "live" and their "dead" parts (i.e. genes and pseudogenes). Surveys of genomes have revealed that, in every organism, there are always a few large families and many small ones, with the overall distribution following a power-law. This commonality is equally true for both genes and pseudogenes, and exists despite the fact that the specific families that are enlarged differ greatly between organisms. Furthermore, because of family structure there is great redundancy in proteomes, a fact linked to the large number of dispensable genes for each organism and the small size of the minimal, indispensable sub-proteome. Pseudogenes in prokaryotes represent families that are in the process of being dispensed with. In particular, the genome sequences of certain pathogenic bacteria (Mycobacterium leprae, Yersinia pestis and Rickettsia prowazekii) show how an organism can undergo reductive evolution on a large scale (i.e. the dying out of families) as a result of niche change. There appears to be less pressure to delete pseudogenes in eukaryotes. These can be divided into two varieties, duplicated and processed, where the latter involves reverse transcription from an mRNA intermediate. We discuss these collectively in yeast, worm, fly, and human. The fly has few pseudogenes apparently because of its high rate of genomic DNA deletion. In the other three organisms, the distribution of pseudogenes on the chromosome and amongst different families is highly non-uniform. Pseudogenes tend not to occur in the middle of chromosome arms, and tend to be associated with lineage-specific (as opposed to highly conserved) families that have environmental-response functions. This may be because, rather than being dead, they may form a reservoir of diverse "extra parts" that can be resurrected to help an organism adapt to its surroundings. In yeast, there may be a novel mechanism involving the [PSI+] prion that potentially enables this resurrection. In worm, the pseudogenes tend to arise out of families (e.g. chemoreceptors) that are greatly expanded in it compared to the fly. The human genome stands out in having many processed pseudogenes. These have a character very different from those of the duplicated variety, to a large extent just representing random insertions. Thus, their occurrence tends to be roughly in proportion to the amount of mRNA for a particular protein and to reflect the extent of the intergenic sequences. Further information about pseudogenes is available at http://genecensus.org/pseudogene

Collapse

Xie H, Wasserman A, Levine Z, Novik A, Grebinskiy V, Shoshan A, Mintz L. Large-scale protein annotation through gene ontology. Genome Res 2002;12:785-94. [PMID: 11997345 PMCID: PMC186564 DOI: 10.1101/gr.86902] [Citation(s) in RCA: 79] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Castresana J. Genes on human chromosome 19 show extreme divergence from the mouse orthologs and a high GC content. Nucleic Acids Res 2002;30:1751-6. [PMID: 11937628 PMCID: PMC113201 DOI: 10.1093/nar/30.8.1751] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Riechmann JL. Transcriptional regulation: a genomic overview. THE ARABIDOPSIS BOOK 2002;1:e0085. [PMID: 22303220 PMCID: PMC3243377 DOI: 10.1199/tab.0085] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]

Lipovich L, Hughes AL, King MC, Abkowitz JL, Quigley JG. Genomic structure and evolutionary context of the human feline leukemia virus subgroup C receptor (hFLVCR) gene: evidence for block duplications and de novo gene formation within duplicons of the hFLVCR locus. Gene 2002;286:203-13. [PMID: 11943475 DOI: 10.1016/s0378-1119(02)00457-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Goodman N. Biological data becomes computer literate: new advances in bioinformatics. Curr Opin Biotechnol 2002;13:68-71. [PMID: 11849961 DOI: 10.1016/s0958-1669(02)00287-2] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]

Harrison PM, Hegyi H, Balasubramanian S, Luscombe NM, Bertone P, Echols N, Johnson T, Gerstein M. Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22. Genome Res 2002;12:272-80. [PMID: 11827946 PMCID: PMC155275 DOI: 10.1101/gr.207102] [Citation(s) in RCA: 151] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Abstract

We have developed an initial approach for annotating and surveying pseudogenes in the human genome. We search human genomic DNA for regions that are similar to known protein sequences and contain obvious disablements (i.e., mid-sequence stop codons or frameshifts), while ensuring minimal overlap with annotations of known genes. Pseudogenes can be divided into "processed" and "nonprocessed"; the former are reverse transcribed from mRNA (and therefore have no intron structure), whereas the latter presumably arise from genomic duplications. We annotate putative processed pseudogenes based on whether there is a continuous span of homology that is >70% of the length of the closest matching human protein (i.e., with introns removed), or whether there is evidence of polyadenylation. We have applied our approach to chromosomes 21 and 22, the first parts of the human genome completely sequenced, finding 190 new pseudogene annotations beyond the 264 reported by the sequencing centers. In total, on chromosomes 21 and 22, there are 189 processed pseudogenes, 195 nonprocessed pseudogenes, and, additionally, 70 pseudogenic immunoglobulin gene segments. (Detailed assignments are available at http://bioinfo.mbb.yale.edu/genome/pseudogene or http://genecensus.org/pseudogene.) By extrapolation, we predict that there could be up to approximately 20,000 pseudogenes in the whole human genome, with a little more than half of them processed. We have determined the main populations and clusters of pseudogenes on chromosomes 21 and 22. There are notable excesses of pseudogenes relative to genes near the centromeres of both chromosomes, indicating the existence of pseudogenic "hot-spots" in the genome. We have looked at the distribution of InterPro families and Gene Ontology (GO) functional categories in our pseudogenes. Overall, the families in both processed and nonprocessed pseudogene populations occur according to a similar power-law distribution as that found for the occurrence of gene families, with a few big families and many small ones. The processed population is, in particular, enriched in highly expressed ribosomal-protein sequences (approximately 20%), which appear fairly evenly distributed across the chromosomes. We compared processed pseudogenes of different evolutionary ages, observing a high degree of similarity between "ancient" and "modern" subpopulations. This may be attributable to the consistently high expression of ribosomal proteins over evolutionary time. Finally, we find that chromosome 22 pseudogene population is dominated by immunoglobulin segments, which have a greater rate of disablement per amino acid than the other pseudogene populations and are also substantially more diverged.

Collapse

Assaad FF. Of weeds and men: what genomes teach us about plant cell biology. CURRENT OPINION IN PLANT BIOLOGY 2001;4:478-487. [PMID: 11641062 DOI: 10.1016/s1369-5266(00)00204-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]

Zhou G, Chen J, Lee S, Clark T, Rowley JD, Wang SM. The pattern of gene expression in human CD34(+) stem/progenitor cells. Proc Natl Acad Sci U S A 2001;98:13966-71. [PMID: 11717454 PMCID: PMC61150 DOI: 10.1073/pnas.241526198] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/04/2001] [Indexed: 11/18/2022] Open

Mattick JS. Non-coding RNAs: the architects of eukaryotic complexity. EMBO Rep 2001;2:986-91. [PMID: 11713189 PMCID: PMC1084129 DOI: 10.1093/embo-reports/kve230] [Citation(s) in RCA: 536] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2001] [Revised: 09/10/2001] [Accepted: 09/11/2001] [Indexed: 11/14/2022] Open