1
|
Langmead B, Wilks C, Antonescu V, Charles R. Scaling read aligners to hundreds of threads on general-purpose processors. Bioinformatics 2019; 35:421-432. [PMID: 30020410 PMCID: PMC6361242 DOI: 10.1093/bioinformatics/bty648] [Citation(s) in RCA: 331] [Impact Index Per Article: 66.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2018] [Accepted: 07/17/2018] [Indexed: 12/27/2022] Open
Abstract
Motivation General-purpose processors can now contain many dozens of processor cores and support hundreds of simultaneous threads of execution. To make best use of these threads, genomics software must contend with new and subtle computer architecture issues. We discuss some of these and propose methods for improving thread scaling in tools that analyze each read independently, such as read aligners. Results We implement these methods in new versions of Bowtie, Bowtie 2 and HISAT. We greatly improve thread scaling in many scenarios, including on the recent Intel Xeon Phi architecture. We also highlight how bottlenecks are exacerbated by variable-record-length file formats like FASTQ and suggest changes that enable superior scaling. Availability and implementation Experiments for this study: https://github.com/BenLangmead/bowtie-scaling. Bowtie http://bowtie-bio.sourceforge.net. Bowtie 2 http://bowtie-bio.sourceforge.net/bowtie2. HISAT http://www.ccb.jhu.edu/software/hisat Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.,Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Christopher Wilks
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.,Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Valentin Antonescu
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Rone Charles
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
2
|
Sherman RM, Forman J, Antonescu V, Puiu D, Daya M, Rafaels N, Boorgula MP, Chavan S, Vergara C, Ortega VE, Levin AM, Eng C, Yazdanbakhsh M, Wilson JG, Marrugo J, Lange LA, Williams LK, Watson H, Ware LB, Olopade CO, Olopade O, Oliveira RR, Ober C, Nicolae DL, Meyers DA, Mayorga A, Knight-Madden J, Hartert T, Hansel NN, Foreman MG, Ford JG, Faruque MU, Dunston GM, Caraballo L, Burchard EG, Bleecker ER, Araujo MI, Herrera-Paz EF, Campbell M, Foster C, Taub MA, Beaty TH, Ruczinski I, Mathias RA, Barnes KC, Salzberg SL. Author Correction: Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nat Genet 2019; 51:364. [PMID: 30647471 DOI: 10.1038/s41588-018-0335-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In the version of this article initially published, the statement "there are no pan-genomes for any other animal or plant species" was incorrect. The statement has been corrected to "there are no reported pan-genomes for any other animal species, to our knowledge." We thank David Edwards for bringing this error to our attention. The error has been corrected in the HTML and PDF versions of the article.
Collapse
Affiliation(s)
- Rachel M Sherman
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA. .,Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
| | - Juliet Forman
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA.,Departments of Computer Science, Biology, and Mathematics, Harvey Mudd College, Claremont, CA, USA
| | - Valentin Antonescu
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Daniela Puiu
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Michelle Daya
- Department of Medicine, University of Colorado Denver, Aurora, CO, USA
| | - Nicholas Rafaels
- Department of Medicine, University of Colorado Denver, Aurora, CO, USA
| | | | - Sameer Chavan
- Department of Medicine, University of Colorado Denver, Aurora, CO, USA
| | | | - Victor E Ortega
- Department of Internal Medicine, Section on Pulmonary, Critical Care, Allergy and Immunologic Diseases, Center for Precision Medicine, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Albert M Levin
- Department of Public Health Sciences, Henry Ford Health System, Detroit, MI, USA
| | - Celeste Eng
- Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Maria Yazdanbakhsh
- Department of Parasitology, Leiden University Medical Center, Leiden, the Netherlands
| | - James G Wilson
- Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, MS, USA
| | - Javier Marrugo
- Institute for Immunological Research, Universidad de Cartagena, Cartagena, Colombia
| | - Leslie A Lange
- Department of Medicine, University of Colorado Denver, Aurora, CO, USA
| | - L Keoki Williams
- Department of Internal Medicine, Henry Ford Health System, Detroit, MI, USA
| | - Harold Watson
- Faculty of Medical Sciences Cave Hill Campus, The University of the West Indies, Bridgetown, Barbados
| | - Lorraine B Ware
- Department of Medicine, Vanderbilt University, Nashville, TN, USA
| | - Christopher O Olopade
- Department of Medicine and Center for Global Health, University of Chicago, Chicago, IL, USA
| | | | - Ricardo R Oliveira
- Laboratório de Patologia Experimental, Centro de Pesquisas Gonçalo Moniz, Salvador, Brazil
| | - Carole Ober
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Dan L Nicolae
- Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Deborah A Meyers
- Department of Medicine, University of Arizona College of Medicine, Tucson, AZ, USA
| | - Alvaro Mayorga
- Centro de Neumologia y Alergias, San Pedro Sula, Honduras
| | - Jennifer Knight-Madden
- Caribbean Institute for Health Research, The University of the West Indies, Kingston, Jamaica
| | - Tina Hartert
- Department of Medicine, Vanderbilt University, Nashville, TN, USA
| | - Nadia N Hansel
- Department of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Marilyn G Foreman
- Pulmonary and Critical Care Medicine, Morehouse School of Medicine, Atlanta, GA, USA
| | - Jean G Ford
- Department of Medicine, Einstein Medical Center, Philadelphia, PA, USA
| | - Mezbah U Faruque
- National Human Genome Center, Howard University College of Medicine, Washington, DC, USA
| | - Georgia M Dunston
- Department of Microbiology, Howard University College of Medicine, Washington, DC, USA
| | - Luis Caraballo
- Institute for Immunological Research, Universidad de Cartagena, Cartagena, Colombia
| | - Esteban G Burchard
- Departments of Bioengineering & Therapeutic Sciences and Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Eugene R Bleecker
- Department of Medicine, University of Arizona College of Medicine, Tucson, AZ, USA
| | - Maria I Araujo
- Immunology Service, Universidade Federal da Bahia, Salvador, Brazil
| | - Edwin F Herrera-Paz
- Facultad de Ciencias Médicas, Universidad Tecnológica Centroamericana (UNITEC), Tegucigalpa, Honduras
| | - Monica Campbell
- Department of Medicine, University of Colorado Denver, Aurora, CO, USA
| | - Cassandra Foster
- Department of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Margaret A Taub
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
| | - Terri H Beaty
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
| | - Ingo Ruczinski
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Rasika A Mathias
- Department of Medicine, Johns Hopkins University, Baltimore, MD, USA.,Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
| | - Kathleen C Barnes
- Department of Medicine, University of Colorado Denver, Aurora, CO, USA
| | - Steven L Salzberg
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA. .,Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA. .,Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA. .,Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
3
|
Parry EM, Gable DL, Stanley SE, Khalil SE, Antonescu V, Florea L, Armanios M. Germline Mutations in DNA Repair Genes in Lung Adenocarcinoma. J Thorac Oncol 2017; 12:1673-1678. [PMID: 28843361 DOI: 10.1016/j.jtho.2017.08.011] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Revised: 07/31/2017] [Accepted: 08/05/2017] [Indexed: 12/13/2022]
Abstract
INTRODUCTION Although lung cancer is generally thought to be environmentally provoked, anecdotal familial clustering has been reported, suggesting that there may be genetic susceptibility factors. We systematically tested whether germline mutations in eight candidate genes may be risk factors for lung adenocarcinoma. METHODS We studied lung adenocarcinoma cases for which germline sequence data had been generated as part of The Cancer Genome Atlas project but had not been previously analyzed. We selected eight genes, ATM serine/threonine kinase gene (ATM), BRCA2, DNA repair associated gene (BRCA2), checkpoint kinase 2 gene (CHEK2), EGFR, parkin RBR E3 ubiquitin protein ligase gene (PARK2), telomerase reverse transcriptase gene (TERT), tumor protein p53 gene (TP53), and Yes associated protein 1 gene (YAP1), on the basis of prior anecdotal association with lung cancer or genome-wide association studies. RESULTS Among 555 lung adenocarcinoma cases, we detected 14 pathogenic mutations in five genes; they occurred at a frequency of 2.5% and represented an OR of 66 (95% confidence interval: 33-125, p < 0.0001 [chi-square test]). The mutations fell most commonly in ATM (50%), followed by TP53, BRCA2, EGFR, and PARK2. Most (86%) of these variants had been reported in other familial cancer syndromes. Another 12 cases (2%) carried ultrarare variants that were predicted to be deleterious by three protein prediction programs; these most frequently involved ATM and BRCA2. CONCLUSIONS A subset of patients with lung adenocarcinoma, at least 2.5% to 4.5%, carry germline variants that have been linked to cancer risk in Mendelian syndromes. The genes fall most frequently in DNA repair pathways. Our data indicate that patients with lung adenocarcinoma, similar to other solid tumors, include a subset of patients with inherited susceptibility.
Collapse
Affiliation(s)
- Erin M Parry
- Osler Medical Housestaff Training Program, Johns Hopkins University School of Medicine, Baltimore, Maryland; Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Dustin L Gable
- Medical Scientist Training Program, Johns Hopkins University School of Medicine, Baltimore, Maryland; Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland; McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Susan E Stanley
- Medical Scientist Training Program, Johns Hopkins University School of Medicine, Baltimore, Maryland; Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland; McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Sara E Khalil
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland; McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Valentin Antonescu
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland; McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Liliana Florea
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland; McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Mary Armanios
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland; McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland; Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, Maryland.
| |
Collapse
|
4
|
Sun Z, Ke X, Salzberg SL, Kim D, Antonescu V, Cheng Y, Huang B, Song JH, Abraham JM, Ibrahim S, Tian H, Meltzer SJ. The novel fusion transcript NR5A2-KLHL29FT is generated by an insertion at the KLHL29 locus. Cancer 2017. [PMID: 28081303 DOI: 10.1002/cncr.30510.] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
BACKGROUND Novel fusion transcripts (FTs) caused by chromosomal rearrangement are common factors in the development of cancers. In the current study, the authors used massively parallel RNA sequencing to identify new FTs in colon cancers. METHODS RNA sequencing (RNA-Seq) and TopHat-Fusion were used to identify new FTs in colon cancers. The authors then investigated whether the novel FT nuclear receptor subfamily 5, group A, member 2 (NR5A2)-Kelch-like family member 29 FT (KLHL29FT) was transcribed from a genomic chromosomal rearrangement. Next, the expression of NR5A2-KLHL29FT was measured by quantitative real-time polymerase chain reaction in colon cancers and matched corresponding normal epithelia. RESULTS The authors identified the FT NR5A2-KLHL29FT in normal and cancerous epithelia. While investigating this transcript, it was unexpectedly found that it was due to an uncharacterized polymorphic germline insertion of the NR5A2 sequence from chromosome 1 into the KLHL29 locus at chromosome 2, rather than a chromosomal rearrangement. This germline insertion, which occurred at a population frequency of 0.40, appeared to bear no relationship to cancer development. Moreover, expression of NR5A2-KLHL29FT was validated in RNA specimens from samples with insertions of NR5A2 at the KLHL29 gene locus, but not from samples without this insertion. It is interesting to note that NR5A2-KLH29FT expression levels were significantly lower in colon cancers than in matched normal colonic epithelia (P =.029), suggesting the potential participation of NR5A2-KLHL29FT in the origin or progression of this tumor type. CONCLUSIONS NR5A2-KLHL29FT was generated from a polymorphism insertion of the NR5A2 sequence into the KLHL29 locus. NR5A2-KLHL29FT may influence the origin or progression of colon cancer. Moreover, researchers should be aware that similar FTs may occur due to transchromosomal insertions that are not correctly annotated in genome databases, especially with current assembly algorithms. Cancer 2017;123:1507-1515. © 2017 American Cancer Society.
Collapse
Affiliation(s)
- Zhenguo Sun
- Department of Thoracic Surgery, Shandong University Qilu Hospital, Jinan, Shandong, China.,Division of Gastroenterology, The Johns Hopkins University School of Medicine, Baltimore, Maryland.,Department of Medicine, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, Maryland.,Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Xiquan Ke
- Division of Gastroenterology, The Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Steven L Salzberg
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, The Johns Hopkins University School of Medicine, Baltimore, Maryland.,Department of Biostatistics, Bloomberg School of Public Health, The Johns Hopkins University, Baltimore, Maryland
| | - Daehwan Kim
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, The Johns Hopkins University School of Medicine, Baltimore, Maryland.,Department of Biostatistics, Bloomberg School of Public Health, The Johns Hopkins University, Baltimore, Maryland
| | - Valentin Antonescu
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, The Johns Hopkins University School of Medicine, Baltimore, Maryland.,Department of Biostatistics, Bloomberg School of Public Health, The Johns Hopkins University, Baltimore, Maryland
| | - Yulan Cheng
- Division of Gastroenterology, The Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Binbin Huang
- Division of Gastroenterology, The Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Jee Hoon Song
- Division of Gastroenterology, The Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - John M Abraham
- Division of Gastroenterology, The Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Sariat Ibrahim
- Division of Gastroenterology, The Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Hui Tian
- Department of Thoracic Surgery, Shandong University Qilu Hospital, Jinan, Shandong, China
| | - Stephen J Meltzer
- Division of Gastroenterology, The Johns Hopkins University School of Medicine, Baltimore, Maryland.,Department of Medicine, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, Maryland.,Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, Maryland
| |
Collapse
|
5
|
Sun Z, Ke X, Salzberg SL, Kim D, Antonescu V, Cheng Y, Huang B, Song JH, Abraham JM, Ibrahim S, Tian H, Meltzer SJ. The novel fusion transcript NR5A2-KLHL29FT is generated by an insertion at the KLHL29 locus. Cancer 2017; 123:1507-1515. [PMID: 28081303 DOI: 10.1002/cncr.30510] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2016] [Revised: 11/08/2016] [Accepted: 11/21/2016] [Indexed: 12/27/2022]
Abstract
BACKGROUND Novel fusion transcripts (FTs) caused by chromosomal rearrangement are common factors in the development of cancers. In the current study, the authors used massively parallel RNA sequencing to identify new FTs in colon cancers. METHODS RNA sequencing (RNA-Seq) and TopHat-Fusion were used to identify new FTs in colon cancers. The authors then investigated whether the novel FT nuclear receptor subfamily 5, group A, member 2 (NR5A2)-Kelch-like family member 29 FT (KLHL29FT) was transcribed from a genomic chromosomal rearrangement. Next, the expression of NR5A2-KLHL29FT was measured by quantitative real-time polymerase chain reaction in colon cancers and matched corresponding normal epithelia. RESULTS The authors identified the FT NR5A2-KLHL29FT in normal and cancerous epithelia. While investigating this transcript, it was unexpectedly found that it was due to an uncharacterized polymorphic germline insertion of the NR5A2 sequence from chromosome 1 into the KLHL29 locus at chromosome 2, rather than a chromosomal rearrangement. This germline insertion, which occurred at a population frequency of 0.40, appeared to bear no relationship to cancer development. Moreover, expression of NR5A2-KLHL29FT was validated in RNA specimens from samples with insertions of NR5A2 at the KLHL29 gene locus, but not from samples without this insertion. It is interesting to note that NR5A2-KLH29FT expression levels were significantly lower in colon cancers than in matched normal colonic epithelia (P =.029), suggesting the potential participation of NR5A2-KLHL29FT in the origin or progression of this tumor type. CONCLUSIONS NR5A2-KLHL29FT was generated from a polymorphism insertion of the NR5A2 sequence into the KLHL29 locus. NR5A2-KLHL29FT may influence the origin or progression of colon cancer. Moreover, researchers should be aware that similar FTs may occur due to transchromosomal insertions that are not correctly annotated in genome databases, especially with current assembly algorithms. Cancer 2017;123:1507-1515. © 2017 American Cancer Society.
Collapse
Affiliation(s)
- Zhenguo Sun
- Department of Thoracic Surgery, Shandong University Qilu Hospital, Jinan, Shandong, China.,Division of Gastroenterology, The Johns Hopkins University School of Medicine, Baltimore, Maryland.,Department of Medicine, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, Maryland.,Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Xiquan Ke
- Division of Gastroenterology, The Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Steven L Salzberg
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, The Johns Hopkins University School of Medicine, Baltimore, Maryland.,Department of Biostatistics, Bloomberg School of Public Health, The Johns Hopkins University, Baltimore, Maryland
| | - Daehwan Kim
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, The Johns Hopkins University School of Medicine, Baltimore, Maryland.,Department of Biostatistics, Bloomberg School of Public Health, The Johns Hopkins University, Baltimore, Maryland
| | - Valentin Antonescu
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, The Johns Hopkins University School of Medicine, Baltimore, Maryland.,Department of Biostatistics, Bloomberg School of Public Health, The Johns Hopkins University, Baltimore, Maryland
| | - Yulan Cheng
- Division of Gastroenterology, The Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Binbin Huang
- Division of Gastroenterology, The Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Jee Hoon Song
- Division of Gastroenterology, The Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - John M Abraham
- Division of Gastroenterology, The Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Sariat Ibrahim
- Division of Gastroenterology, The Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Hui Tian
- Department of Thoracic Surgery, Shandong University Qilu Hospital, Jinan, Shandong, China
| | - Stephen J Meltzer
- Division of Gastroenterology, The Johns Hopkins University School of Medicine, Baltimore, Maryland.,Department of Medicine, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, Maryland.,Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, Maryland
| |
Collapse
|
6
|
Cannon EKS, Birkett SM, Braun BL, Kodavali S, Jennewein DM, Yilmaz A, Antonescu V, Antonescu C, Harper LC, Gardiner JM, Schaeffer ML, Campbell DA, Andorf CM, Andorf D, Lisch D, Koch KE, McCarty DR, Quackenbush J, Grotewold E, Lushbough CM, Sen TZ, Lawrence CJ. POPcorn: An Online Resource Providing Access to Distributed and Diverse Maize Project Data. Int J Plant Genomics 2011; 2011:923035. [PMID: 22253616 PMCID: PMC3255282 DOI: 10.1155/2011/923035] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2011] [Accepted: 11/29/2011] [Indexed: 05/21/2023]
Abstract
The purpose of the online resource presented here, POPcorn (Project Portal for corn), is to enhance accessibility of maize genetic and genomic resources for plant biologists. Currently, many online locations are difficult to find, some are best searched independently, and individual project websites often degrade over time-sometimes disappearing entirely. The POPcorn site makes available (1) a centralized, web-accessible resource to search and browse descriptions of ongoing maize genomics projects, (2) a single, stand-alone tool that uses web Services and minimal data warehousing to search for sequence matches in online resources of diverse offsite projects, and (3) a set of tools that enables researchers to migrate their data to the long-term model organism database for maize genetic and genomic information: MaizeGDB. Examples demonstrating POPcorn's utility are provided herein.
Collapse
Affiliation(s)
- Ethalinda K. S. Cannon
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
| | - Scott M. Birkett
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
| | - Bremen L. Braun
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
| | - Sateesh Kodavali
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
| | - Douglas M. Jennewein
- Department of Computer Science, University of South Dakota, Vermillion, SD 57069, USA
| | - Alper Yilmaz
- Plant Biotechnology Center and Department of Molecular Genetics, The Ohio State University, Columbus, OH 43210, USA
| | - Valentin Antonescu
- Department of Biostatistics and Computational Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Sm822, Boston, MA 02215, USA
| | - Corina Antonescu
- Department of Biostatistics and Computational Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Sm822, Boston, MA 02215, USA
| | - Lisa C. Harper
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
- USDA-ARS Plant Gene Expression Center, Albany, CA 94710, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| | - Jack M. Gardiner
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
- School of Plant Sciences, University of Arizona, Tucson, AZ 85721, USA
| | - Mary L. Schaeffer
- USDA-ARS Plant Genetics Research Unit, University of Missouri, Columbia, MO 65211, USA
- Division of Plant Sciences, Department of Agronomy, University of Missouri, Columbia, MO 65211, USA
| | - Darwin A. Campbell
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
| | - Carson M. Andorf
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
| | - Destri Andorf
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
| | - Damon Lisch
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Karen E. Koch
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| | - Donald R. McCarty
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| | - John Quackenbush
- Department of Biostatistics and Computational Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Sm822, Boston, MA 02215, USA
| | - Erich Grotewold
- Plant Biotechnology Center and Department of Molecular Genetics, The Ohio State University, Columbus, OH 43210, USA
| | - Carol M. Lushbough
- Department of Computer Science, University of South Dakota, Vermillion, SD 57069, USA
| | - Taner Z. Sen
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
| | - Carolyn J. Lawrence
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
- *Carolyn J. Lawrence:
| |
Collapse
|
7
|
Abstract
The DFCI Gene Index Web pages provide access to analyses of ESTs and gene sequences for nearly 114 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a home page. A variety of methods exist that allow users to search each species-specific database. Methods implemented currently include nucleotide or protein sequence queries using WU-BLAST, text-based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information.
Collapse
|
8
|
Tsai J, Sultana R, Lee Y, Pertea G, Karamycheva S, Antonescu V, Cho J, Parvizi B, Cheung F, Quackenbush J. RESOURCERER: a database for annotating and linking microarray resources within and across species. Genome Biol 2005; 2:SOFTWARE0002. [PMID: 16173164 PMCID: PMC138985 DOI: 10.1186/gb-2001-2-11-software0002] [Citation(s) in RCA: 86] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Microarray expression analysis is providing unprecedented data on gene expression in humans and mammalian model systems. Although such studies provide a tremendous resource for understanding human disease states, one of the significant challenges is cross-referencing the data derived from different species, across diverse expression analysis platforms, in order to properly derive inferences regarding gene expression and disease state. To address this problem, we have developed RESOURCERER, a microarray-resource annotation and cross-reference database built using the analysis of expressed sequence tags (ESTs) and gene sequences provided by the TIGR Gene Index (TGI) and TIGR Orthologous Gene Alignment (TOGA) databases [now called Eukaryotic Gene Orthologs (EGO)].
Collapse
Affiliation(s)
- Jennifer Tsai
- The Institute for Genomic Research, Rockville, MD 20850, USA
| | - Razvan Sultana
- The Institute for Genomic Research, Rockville, MD 20850, USA
| | - Yudan Lee
- The Institute for Genomic Research, Rockville, MD 20850, USA
| | - Geo Pertea
- The Institute for Genomic Research, Rockville, MD 20850, USA
| | | | | | - Jennifer Cho
- The Institute for Genomic Research, Rockville, MD 20850, USA
| | - Babak Parvizi
- The Institute for Genomic Research, Rockville, MD 20850, USA
| | - Foo Cheung
- The Institute for Genomic Research, Rockville, MD 20850, USA
| | | |
Collapse
|
9
|
Lee Y, Tsai J, Sunkara S, Karamycheva S, Pertea G, Sultana R, Antonescu V, Chan A, Cheung F, Quackenbush J. The TIGR Gene Indices: clustering and assembling EST and known genes and integration with eukaryotic genomes. Nucleic Acids Res 2005; 33:D71-4. [PMID: 15608288 PMCID: PMC540018 DOI: 10.1093/nar/gki064] [Citation(s) in RCA: 159] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Although the list of completed genome sequencing projects has expanded rapidly, sequencing and analysis of expressed sequence tags (ESTs) remain a primary tool for discovery of novel genes in many eukaryotes and a key element in genome annotation. The TIGR Gene Indices (http://www.tigr.org/tdb/tgi) are a collection of 77 species-specific databases that use a highly refined protocol to analyze gene and EST sequences in an attempt to identify and characterize expressed transcripts and to present them on the Web in a user-friendly, consistent fashion. A Gene Index database is constructed for each selected organism by first clustering, then assembling EST and annotated cDNA and gene sequences from GenBank. This process produces a set of unique, high-fidelity virtual transcripts, or tentative consensus (TC) sequences. The TC sequences can be used to provide putative genes with functional annotation, to link the transcripts to genetic and physical maps, to provide links to orthologous and paralogous genes, and as a resource for comparative and functional genomic analysis.
Collapse
Affiliation(s)
- Y Lee
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Pertea G, Huang X, Liang F, Antonescu V, Sultana R, Karamycheva S, Lee Y, White J, Cheung F, Parvizi B, Tsai J, Quackenbush J. TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics 2003; 19:651-2. [PMID: 12651724 DOI: 10.1093/bioinformatics/btg034] [Citation(s) in RCA: 1329] [Impact Index Per Article: 63.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
TGICL is a pipeline for analysis of large Expressed Sequence Tags (EST) and mRNA databases in which the sequences are first clustered based on pairwise sequence similarity, and then assembled by individual clusters (optionally with quality values) to produce longer, more complete consensus sequences. The system can run on multi-CPU architectures including SMP and PVM.
Collapse
Affiliation(s)
- Geo Pertea
- The Institute for Genomic Research, Rockville, MD 20850, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Lee Y, Sultana R, Pertea G, Cho J, Karamycheva S, Tsai J, Parvizi B, Cheung F, Antonescu V, White J, Holt I, Liang F, Quackenbush J. Cross-referencing eukaryotic genomes: TIGR Orthologous Gene Alignments (TOGA). Genome Res 2002; 12:493-502. [PMID: 11875039 PMCID: PMC155294 DOI: 10.1101/gr.212002] [Citation(s) in RCA: 122] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Comparative genomics promises to rapidly accelerate the identification and functional classification of biologically important human genes. We developed the TIGR Orthologous Gene Alignment (TOGA; <http://www.tigr.org/tdb/toga/toga.shtml>) database to provide a cross-reference between fully and partially sequenced eukaryotic transcribed sequences. Starting with the assembled expressed sequence tag (EST) and gene sequences that comprise the 28 TIGR Gene Indices, we used high-stringency pair-wise sequence searches and a reflexive, transitive closure process to associate sequence-specific best hits, generating 32,652 tentative ortholog groups (TOGs). This has allowed us to identify putative orthologs and paralogs for known genes, as well as those that exist only as uncharacterized ESTs and to provide links to additional information including genome sequence and mapping data. TOGA provides an important new resource for the analysis of gene function in eukaryotes. In addition, an analysis of the most widely represented sequences can begin to provide insight into eukaryotic biological processes.
Collapse
Affiliation(s)
- Yuandan Lee
- The Institute for Genomic Research, Rockville, Maryland 20850, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|