Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

13092
(from Reference Citation Analysis)

Article PDFs (3815)

Cited by ≥ 1 (6797)

Searched Name

Genomics/methods

Year Published

Show more Refine

Article Statistics

Refine

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Journal Articles

Rank	Citation Analysis	Article Type	Number of Years	Citation(s) in RCA
1	Szklarczyk D, Gable AL, Lyon D, Junge A, Wyder S, Huerta-Cepas J, Simonovic M, Doncheva NT, Morris JH, Bork P, Jensen LJ, Mering C. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res 2019;47:D607-D613. [PMID: 30476243 PMCID: PMC6323986 DOI: 10.1093/nar/gky1131] [Citation(s) in RCA: 11399] [Impact Index Per Article: 1899.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 10/23/2018] [Accepted: 11/16/2018] [Indexed: 02/07/2023] Open Abstract Proteins and their functional interactions form the backbone of the cellular machinery. Their connectivity network needs to be considered for the full understanding of biological phenomena, but the available information on protein-protein associations is incomplete and exhibits varying levels of annotation granularity and reliability. The STRING database aims to collect, score and integrate all publicly available sources of protein-protein interaction information, and to complement these with computational predictions. Its goal is to achieve a comprehensive and objective global network, including direct (physical) as well as indirect (functional) interactions. The latest version of STRING (11.0) more than doubles the number of organisms it covers, to 5090. The most important new feature is an option to upload entire, genome-wide datasets as input, allowing users to visualize subsets as interaction networks and to perform gene-set enrichment analysis on the entire input. For the enrichment analysis, STRING implements well-known classification systems such as Gene Ontology and KEGG, but also offers additional, new classification systems based on high-throughput text-mining as well as on a hierarchical clustering of the association network itself. The STRING resource is available online at https://string-db.org/. Collapse Key Words Collapse MESH Headings Animals Databases, Genetic Gene Ontology Genomics/methods Humans Protein Interaction Mapping/methods Software Collapse Grants P41 GM103504 NIGMS NIH HHS Collapse	Research Support, N.I.H., Extramural	6	11399
2	Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP. Integrative genomics viewer. Nat Biotechnol 2011;29:24-6. [PMID: 21221095 PMCID: PMC3346182 DOI: 10.1038/nbt.1754] [Citation(s) in RCA: 10408] [Impact Index Per Article: 743.4] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Abstract Collapse Key Words Collapse MESH Headings Chromosome Mapping/methods Computational Biology/methods Computer Graphics Gene Dosage Gene Expression Profiling Genomics/methods Glioblastoma/genetics Humans Information Storage and Retrieval/methods Internet Neoplasms/genetics Oligonucleotide Array Sequence Analysis Online Systems Polymorphism, Single Nucleotide Software Collapse Grants R01 GM074024 NIGMS NIH HHS U54 HG003067 NHGRI NIH HHS R21 CA135827-03S1 NCI NIH HHS U54HG003067 NHGRI NIH HHS R21 CA135827 NCI NIH HHS R21CA135827 NCI NIH HHS R01GM074024 NIGMS NIH HHS Collapse	Letter	14	10408
3	Lu R, Zhao X, Li J, Niu P, Yang B, Wu H, Wang W, Song H, Huang B, Zhu N, Bi Y, Ma X, Zhan F, Wang L, Hu T, Zhou H, Hu Z, Zhou W, Zhao L, Chen J, Meng Y, Wang J, Lin Y, Yuan J, Xie Z, Ma J, Liu WJ, Wang D, Xu W, Holmes EC, Gao GF, Wu G, Chen W, Shi W, Tan W. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. Lancet 2020;395:565-574. [PMID: 32007145 PMCID: PMC7159086 DOI: 10.1016/s0140-6736(20)30251-8] [Citation(s) in RCA: 7534] [Impact Index Per Article: 1506.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 01/26/2020] [Accepted: 01/27/2020] [Indexed: 12/02/2022] Abstract BACKGROUND In late December, 2019, patients presenting with viral pneumonia due to an unidentified microbial agent were reported in Wuhan, China. A novel coronavirus was subsequently identified as the causative pathogen, provisionally named 2019 novel coronavirus (2019-nCoV). As of Jan 26, 2020, more than 2000 cases of 2019-nCoV infection have been confirmed, most of which involved people living in or visiting Wuhan, and human-to-human transmission has been confirmed. METHODS We did next-generation sequencing of samples from bronchoalveolar lavage fluid and cultured isolates from nine inpatients, eight of whom had visited the Huanan seafood market in Wuhan. Complete and partial 2019-nCoV genome sequences were obtained from these individuals. Viral contigs were connected using Sanger sequencing to obtain the full-length genomes, with the terminal regions determined by rapid amplification of cDNA ends. Phylogenetic analysis of these 2019-nCoV genomes and those of other coronaviruses was used to determine the evolutionary history of the virus and help infer its likely origin. Homology modelling was done to explore the likely receptor-binding properties of the virus. FINDINGS The ten genome sequences of 2019-nCoV obtained from the nine patients were extremely similar, exhibiting more than 99·98% sequence identity. Notably, 2019-nCoV was closely related (with 88% identity) to two bat-derived severe acute respiratory syndrome (SARS)-like coronaviruses, bat-SL-CoVZC45 and bat-SL-CoVZXC21, collected in 2018 in Zhoushan, eastern China, but were more distant from SARS-CoV (about 79%) and MERS-CoV (about 50%). Phylogenetic analysis revealed that 2019-nCoV fell within the subgenus Sarbecovirus of the genus Betacoronavirus, with a relatively long branch length to its closest relatives bat-SL-CoVZC45 and bat-SL-CoVZXC21, and was genetically distinct from SARS-CoV. Notably, homology modelling revealed that 2019-nCoV had a similar receptor-binding domain structure to that of SARS-CoV, despite amino acid variation at some key residues. INTERPRETATION 2019-nCoV is sufficiently divergent from SARS-CoV to be considered a new human-infecting betacoronavirus. Although our phylogenetic analysis suggests that bats might be the original host of this virus, an animal sold at the seafood market in Wuhan might represent an intermediate host facilitating the emergence of the virus in humans. Importantly, structural analysis suggests that 2019-nCoV might be able to bind to the angiotensin-converting enzyme 2 receptor in humans. The future evolution, adaptation, and spread of this virus warrant urgent investigation. FUNDING National Key Research and Development Program of China, National Major Project for Control and Prevention of Infectious Disease in China, Chinese Academy of Sciences, Shandong First Medical University. Collapse Key Words Collapse MESH Headings Betacoronavirus/genetics Betacoronavirus/metabolism Bronchoalveolar Lavage Fluid/virology COVID-19 China/epidemiology Coronavirus Infections/diagnosis Coronavirus Infections/epidemiology Coronavirus Infections/transmission Coronavirus Infections/virology DNA, Viral/genetics Disease Reservoirs/virology Genome, Viral Genomics/methods High-Throughput Nucleotide Sequencing/methods Humans Phylogeny Pneumonia, Viral/diagnosis Pneumonia, Viral/epidemiology Pneumonia, Viral/transmission Pneumonia, Viral/virology Receptors, Virus/metabolism SARS-CoV-2 Sequence Alignment Collapse Grants Collapse	research-article	5	7534
4	Meier-Kolthoff JP, Auch AF, Klenk HP, Göker M. Genome sequence-based species delimitation with confidence intervals and improved distance functions. BMC Bioinformatics 2013;14:60. [PMID: 23432962 PMCID: PMC3665452 DOI: 10.1186/1471-2105-14-60] [Citation(s) in RCA: 5056] [Impact Index Per Article: 421.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2012] [Accepted: 02/04/2013] [Indexed: 11/10/2022] Open Abstract BACKGROUND For the last 25 years species delimitation in prokaryotes (Archaea and Bacteria) was to a large extent based on DNA-DNA hybridization (DDH), a tedious lab procedure designed in the early 1970s that served its purpose astonishingly well in the absence of deciphered genome sequences. With the rapid progress in genome sequencing time has come to directly use the now available and easy to generate genome sequences for delimitation of species. GBDP (Genome Blast Distance Phylogeny) infers genome-to-genome distances between pairs of entirely or partially sequenced genomes, a digital, highly reliable estimator for the relatedness of genomes. Its application as an in-silico replacement for DDH was recently introduced. The main challenge in the implementation of such an application is to produce digital DDH values that must mimic the wet-lab DDH values as close as possible to ensure consistency in the Prokaryotic species concept. RESULTS Correlation and regression analyses were used to determine the best-performing methods and the most influential parameters. GBDP was further enriched with a set of new features such as confidence intervals for intergenomic distances obtained via resampling or via the statistical models for DDH prediction and an additional family of distance functions. As in previous analyses, GBDP obtained the highest agreement with wet-lab DDH among all tested methods, but improved models led to a further increase in the accuracy of DDH prediction. Confidence intervals yielded stable results when inferred from the statistical models, whereas those obtained via resampling showed marked differences between the underlying distance functions. CONCLUSIONS Despite the high accuracy of GBDP-based DDH prediction, inferences from limited empirical data are always associated with a certain degree of uncertainty. It is thus crucial to enrich in-silico DDH replacements with confidence-interval estimation, enabling the user to statistically evaluate the outcomes. Such methodological advancements, easily accessible through the web service at http://ggdc.dsmz.de, are crucial steps towards a consistent and truly genome sequence-based classification of microorganisms. Collapse Key Words archaea bacteria blast ddh ggd ggdc gbdp genomics mummer phylogeny species concept taxonomy Collapse MESH Headings Archaea/classification Archaea/genetics Bacteria/classification Bacteria/genetics Confidence Intervals DNA/chemistry Genomics/methods Models, Statistical Nucleic Acid Hybridization/methods Phylogeny Regression Analysis Sequence Analysis, DNA Collapse Grants Collapse	research-article	12	5056
5	Lagesen K, Hallin P, Rødland EA, Staerfeldt HH, Rognes T, Ussery DW. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 2007;35:3100-8. [PMID: 17452365 PMCID: PMC1888812 DOI: 10.1093/nar/gkm160] [Citation(s) in RCA: 4803] [Impact Index Per Article: 266.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open Abstract The publication of a complete genome sequence is usually accompanied by annotations of its genes. In contrast to protein coding genes, genes for ribosomal RNA (rRNA) are often poorly or inconsistently annotated. This makes comparative studies based on rRNA genes difficult. We have therefore created computational predictors for the major rRNA species from all kingdoms of life and compiled them into a program called RNAmmer. The program uses hidden Markov models trained on data from the 5S ribosomal RNA database and the European ribosomal RNA database project. A pre-screening step makes the method fast with little loss of sensitivity, enabling the analysis of a complete bacterial genome in less than a minute. Results from running RNAmmer on a large set of genomes indicate that the location of rRNAs can be predicted with a very high level of accuracy. Novel, unannotated rRNAs are also predicted in many genomes. The software as well as the genome analysis results are available at the CBS web server. Collapse Key Words Collapse MESH Headings Computational Biology/methods Genes, rRNA Genome, Bacterial Genomics/methods Markov Chains Software Collapse Grants Collapse	Research Support, Non-U.S. Gov't	18	4803
6	The International HapMap Project. Nature 2004;426:789-96. [PMID: 14685227 DOI: 10.1038/nature02168] [Citation(s) in RCA: 4299] [Impact Index Per Article: 204.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Abstract The goal of the International HapMap Project is to determine the common patterns of DNA sequence variation in the human genome and to make this information freely available in the public domain. An international consortium is developing a map of these patterns across the genome by determining the genotypes of one million or more sequence variants, their frequencies and the degree of association between them, in DNA samples from populations with ancestry from parts of Africa, Asia and Europe. The HapMap will allow the discovery of sequence variants that affect common disease, will facilitate development of diagnostic tools, and will enhance our ability to choose targets for therapeutic intervention. Collapse Key Words Biomedical and Behavioral Research Empirical Approach Genetics and Reproduction Collapse MESH Headings Base Sequence DNA/genetics Gene Frequency Genetic Variation/genetics Genome, Human Genomics/methods Haplotypes/genetics Humans International Cooperation Polymorphism, Single Nucleotide/genetics Public Sector Racial Groups/genetics Collapse Grants Collapse	Research Support, U.S. Gov't, P.H.S.	21	4299
7	Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL. Versatile and open software for comparing large genomes. Genome Biol 2004;5:R12. [PMID: 14759262 PMCID: PMC395750 DOI: 10.1186/gb-2004-5-2-r12] [Citation(s) in RCA: 3792] [Impact Index Per Article: 180.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2003] [Revised: 12/15/2003] [Accepted: 12/17/2003] [Indexed: 11/29/2022] Open Abstract The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes. The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes. Two new graphical viewing tools provide alternative ways to analyze genome alignments. The new system is the first version of MUMmer to be released as open-source software. This allows other developers to contribute to the code base and freely redistribute the code. The MUMmer sources are available at . Collapse Key Words Collapse MESH Headings Animals Anopheles/genetics Computer Graphics Drosophila/genetics Genome Genome, Fungal Genome, Human Genomics/methods Humans Sequence Alignment/methods Software Collapse Grants T32 GM008715 NIGMS NIH HHS R01 LM006845 NLM NIH HHS T32 GM08715 NIGMS NIH HHS N01AI15447 NIAID NIH HHS R01-LM06845 NLM NIH HHS Collapse	Research Support, U.S. Gov't, P.H.S.	21	3792
8	Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RGPM, Granton P, Zegers CML, Gillies R, Boellard R, Dekker A, Aerts HJWL. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer 2012;48:441-6. [PMID: 22257792 DOI: 10.1016/j.ejca.2011.11.036] [Citation(s) in RCA: 3709] [Impact Index Per Article: 285.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2011] [Accepted: 11/21/2011] [Indexed: 01/16/2023] Abstract Solid cancers are spatially and temporally heterogeneous. This limits the use of invasive biopsy based molecular assays but gives huge potential for medical imaging, which has the ability to capture intra-tumoural heterogeneity in a non-invasive way. During the past decades, medical imaging innovations with new hardware, new imaging agents and standardised protocols, allows the field to move towards quantitative imaging. Therefore, also the development of automated and reproducible analysis methodologies to extract more information from image-based features is a requirement. Radiomics--the high-throughput extraction of large amounts of image features from radiographic images--addresses this problem and is one of the approaches that hold great promises but need further validation in multi-centric settings and in the laboratory. Collapse Key Words Collapse MESH Headings Algorithms Diagnostic Imaging/methods Diagnostic Imaging/statistics & numerical data Diagnostic Imaging/trends Genomics/methods High-Throughput Screening Assays/methods Humans Image Processing, Computer-Assisted/methods Image Processing, Computer-Assisted/statistics & numerical data Models, Biological Pattern Recognition, Automated/methods Proteomics/methods Radioactive Tracers Radiometry/methods Radiometry/statistics & numerical data Collapse Grants U01 CA143062 NCI NIH HHS Collapse	Review	13	3709
9	Thorsson V, Gibbs DL, Brown SD, Wolf D, Bortone DS, Ou Yang TH, Porta-Pardo E, Gao GF, Plaisier CL, Eddy JA, Ziv E, Culhane AC, Paull EO, Sivakumar IKA, Gentles AJ, Malhotra R, Farshidfar F, Colaprico A, Parker JS, Mose LE, Vo NS, Liu J, Liu Y, Rader J, Dhankani V, Reynolds SM, Bowlby R, Califano A, Cherniack AD, Anastassiou D, Bedognetti D, Mokrab Y, Newman AM, Rao A, Chen K, Krasnitz A, Hu H, Malta TM, Noushmehr H, Pedamallu CS, Bullman S, Ojesina AI, Lamb A, Zhou W, Shen H, Choueiri TK, Weinstein JN, Guinney J, Saltz J, Holt RA, Rabkin CS, Lazar AJ, Serody JS, Demicco EG, Disis ML, Vincent BG, Shmulevich I. The Immune Landscape of Cancer. Immunity 2018;48:812-830.e14. [PMID: 29628290 PMCID: PMC5982584 DOI: 10.1016/j.immuni.2018.03.023] [Citation(s) in RCA: 3692] [Impact Index Per Article: 527.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2017] [Revised: 01/23/2018] [Accepted: 03/21/2018] [Indexed: 02/08/2023] Abstract We performed an extensive immunogenomic analysis of more than 10,000 tumors comprising 33 diverse cancer types by utilizing data compiled by TCGA. Across cancer types, we identified six immune subtypes-wound healing, IFN-γ dominant, inflammatory, lymphocyte depleted, immunologically quiet, and TGF-β dominant-characterized by differences in macrophage or lymphocyte signatures, Th1:Th2 cell ratio, extent of intratumoral heterogeneity, aneuploidy, extent of neoantigen load, overall cell proliferation, expression of immunomodulatory genes, and prognosis. Specific driver mutations correlated with lower (CTNNB1, NRAS, or IDH1) or higher (BRAF, TP53, or CASP8) leukocyte levels across all cancers. Multiple control modalities of the intracellular and extracellular networks (transcription, microRNAs, copy number, and epigenetic processes) were involved in tumor-immune cell interactions, both across and within immune subtypes. Our immunogenomics pipeline to characterize these heterogeneous tumors and the resulting data are intended to serve as a resource for future targeted studies to further advance the field. Collapse Key Words cancer genomics immune subtypes immuno-oncology immunomodulatory immunotherapy integrative network analysis tumor immunology tumor microenvironment Collapse MESH Headings Adolescent Adult Aged Aged, 80 and over Child Female Genomics/methods Humans Interferon-gamma/genetics Interferon-gamma/immunology Macrophages/immunology Male Middle Aged Neoplasms/classification Neoplasms/genetics Neoplasms/immunology Prognosis Th1-Th2 Balance/physiology Transforming Growth Factor beta/genetics Transforming Growth Factor beta/immunology Wound Healing/genetics Wound Healing/immunology Young Adult Collapse Grants U24 CA143866 NCI NIH HHS P30 CA016086 NCI NIH HHS U54 HG003273 NHGRI NIH HHS P50 CA058223 NCI NIH HHS U24 CA144025 NCI NIH HHS U24 CA143843 NCI NIH HHS U24 CA143848 NCI NIH HHS HHSN261201400007C NCI NIH HHS U01 CA217858 NCI NIH HHS U24 CA210949 NCI NIH HHS U24 CA143867 NCI NIH HHS R50 CA221675 NCI NIH HHS U24 CA210990 NCI NIH HHS P30 ES010126 NIEHS NIH HHS P30 CA016672 NCI NIH HHS U24 CA143882 NCI NIH HHS U54 CA209997 NCI NIH HHS U54 HG003067 NHGRI NIH HHS U24 CA143835 NCI NIH HHS R01 LM009239 NLM NIH HHS U24 CA180924 NCI NIH HHS U24 CA210950 NCI NIH HHS R01 CA184585 NCI NIH HHS U24 CA143845 NCI NIH HHS U24 CA143799 NCI NIH HHS S10 OD012351 NIH HHS U24 CA143840 NCI NIH HHS U24 CA143858 NCI NIH HHS P20 GM130423 NIGMS NIH HHS HHSN261201400008C NCI NIH HHS U24 CA210957 NCI NIH HHS P30 CA045508 NCI NIH HHS U54 HG003079 NHGRI NIH HHS U24 CA210969 NCI NIH HHS U24 CA143883 NCI NIH HHS S10 OD021764 NIH HHS R01 CA163722 NCI NIH HHS R35 CA197745 NCI NIH HHS K24 CA169004 NCI NIH HHS Collapse	Research Support, N.I.H., Extramural	7	3692
10	UniProt: the universal protein knowledgebase. Nucleic Acids Res 2016;45:D158-D169. [PMID: 27899622 PMCID: PMC5210571 DOI: 10.1093/nar/gkw1099] [Citation(s) in RCA: 3318] [Impact Index Per Article: 368.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Revised: 10/25/2016] [Accepted: 11/05/2016] [Indexed: 02/06/2023] Open Abstract The UniProt knowledgebase is a large resource of protein sequences and associated detailed annotation. The database contains over 60 million sequences, of which over half a million sequences have been curated by experts who critically review experimental and predicted data for each protein. The remainder are automatically annotated based on rule systems that rely on the expert curated knowledge. Since our last update in 2014, we have more than doubled the number of reference proteomes to 5631, giving a greater coverage of taxonomic diversity. We implemented a pipeline to remove redundant highly similar proteomes that were causing excessive redundancy in UniProt. The initial run of this pipeline reduced the number of sequences in UniProt by 47 million. For our users interested in the accessory proteomes, we have made available sets of pan proteome sequences that cover the diversity of sequences for each species that is found in its strains and sub-strains. To help interpretation of genomic variants, we provide tracks of detailed protein information for the major genome browsers. We provide a SPARQL endpoint that allows complex queries of the more than 22 billion triples of data in UniProt (http://sparql.uniprot.org/). UniProt resources can be accessed via the website at http://www.uniprot.org/. Collapse Key Words Collapse MESH Headings Computational Biology/methods Databases, Protein Genomics/methods Proteome Proteomics/methods Web Browser Collapse Grants R13 GM109648 NIGMS NIH HHS RG/13/5/30112 British Heart Foundation R01 GM080646 NIGMS NIH HHS U41 HG002273 NHGRI NIH HHS G-1307 Parkinson's UK U01 GM120953 NIGMS NIH HHS P20 GM103446 NIGMS NIH HHS Wellcome Trust U41 HG007822 NHGRI NIH HHS Collapse	Research Support, Non-U.S. Gov't	9	3318
11	Charoentong P, Finotello F, Angelova M, Mayer C, Efremova M, Rieder D, Hackl H, Trajanoski Z. Pan-cancer Immunogenomic Analyses Reveal Genotype-Immunophenotype Relationships and Predictors of Response to Checkpoint Blockade. Cell Rep 2017;18:248-262. [PMID: 28052254 DOI: 10.1016/j.celrep.2016.12.019] [Citation(s) in RCA: 3065] [Impact Index Per Article: 383.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Revised: 10/31/2016] [Accepted: 12/07/2016] [Indexed: 12/11/2022] Open Abstract The Cancer Genome Atlas revealed the genomic landscapes of human cancers. In parallel, immunotherapy is transforming the treatment of advanced cancers. Unfortunately, the majority of patients do not respond to immunotherapy, making the identification of predictive markers and the mechanisms of resistance an area of intense research. To increase our understanding of tumor-immune cell interactions, we characterized the intratumoral immune landscapes and the cancer antigenomes from 20 solid cancers and created The Cancer Immunome Atlas (https://tcia.at/). Cellular characterization of the immune infiltrates showed that tumor genotypes determine immunophenotypes and tumor escape mechanisms. Using machine learning, we identified determinants of tumor immunogenicity and developed a scoring scheme for the quantification termed immunophenoscore. The immunophenoscore was a superior predictor of response to anti-cytotoxic T lymphocyte antigen-4 (CTLA-4) and anti-programmed cell death protein 1 (anti-PD-1) antibodies in two independent validation cohorts. Our findings and this resource may help inform cancer immunotherapy and facilitate the development of precision immuno-oncology. Collapse Key Words cancer-germline antigens neoantigens predictive marker tumor-infiltrating lymphocytes Collapse MESH Headings Antigens, Neoplasm/immunology CD4-Positive T-Lymphocytes/immunology CD8-Positive T-Lymphocytes/immunology CTLA-4 Antigen/metabolism Cell Cycle Checkpoints Genomics/methods Genotype Humans Immunophenotyping Immunotherapy Machine Learning Mutation/genetics Neoplasms/genetics Neoplasms/immunology Prognosis Programmed Cell Death 1 Receptor/metabolism Collapse Grants Collapse	Research Support, Non-U.S. Gov't	8	3065
12	Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, Gonzalez JR, Gratacos M, Huang J, Kalaitzopoulos D, Komura D, MacDonald JR, Marshall CR, Mei R, Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J, Valsesia A, Woodwark C, Yang F, Zhang J, Zerjal T, Zhang J, Armengol L, Conrad DF, Estivill X, Tyler-Smith C, Carter NP, Aburatani H, Lee C, Jones KW, Scherer SW, Hurles ME. Global variation in copy number in the human genome. Nature 2006;444:444-54. [PMID: 17122850 PMCID: PMC2669898 DOI: 10.1038/nature05329] [Citation(s) in RCA: 3010] [Impact Index Per Article: 158.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2006] [Accepted: 10/10/2006] [Indexed: 01/08/2023] Abstract Copy number variation (CNV) of DNA sequences is functionally significant but has yet to be fully ascertained. We have constructed a first-generation CNV map of the human genome through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia (the HapMap collection). DNA from these individuals was screened for CNV using two complementary technologies: single-nucleotide polymorphism (SNP) genotyping arrays, and clone-based comparative genomic hybridization. A total of 1,447 copy number variable regions (CNVRs), which can encompass overlapping or adjacent gains or losses, covering 360 megabases (12% of the genome) were identified in these populations. These CNVRs contained hundreds of genes, disease loci, functional elements and segmental duplications. Notably, the CNVRs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution. The data obtained delineate linkage disequilibrium patterns for many CNVs, and reveal marked variation in copy number among populations. We also demonstrate the utility of this resource for genetic disease studies. Collapse Key Words Collapse MESH Headings Chromosome Mapping Gene Dosage Genetic Variation Genetics, Population Genome, Human Genomics/methods Genotype Humans Linkage Disequilibrium Molecular Diagnostic Techniques Oligonucleotide Array Sequence Analysis/methods Polymorphism, Single Nucleotide Collapse Grants Wellcome Trust 077008 Wellcome Trust 077009 Wellcome Trust 077014 Wellcome Trust Collapse	research-article	19	3010
13	Doench JG, Fusi N, Sullender M, Hegde M, Vaimberg EW, Donovan KF, Smith I, Tothova Z, Wilen C, Orchard R, Virgin HW, Listgarten J, Root DE. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat Biotechnol 2016;34:184-191. [PMID: 26780180 PMCID: PMC4744125 DOI: 10.1038/nbt.3437] [Citation(s) in RCA: 2868] [Impact Index Per Article: 318.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2015] [Accepted: 11/19/2015] [Indexed: 12/12/2022] Abstract CRISPR-Cas9-based genetic screens are a powerful new tool in biology. By simply altering the sequence of the single-guide RNA (sgRNA), one can reprogram Cas9 to target different sites in the genome with relative ease, but the on-target activity and off-target effects of individual sgRNAs can vary widely. Here, we use recently devised sgRNA design rules to create human and mouse genome-wide libraries, perform positive and negative selection screens and observe that the use of these rules produced improved results. Additionally, we profile the off-target activity of thousands of sgRNAs and develop a metric to predict off-target sites. We incorporate these findings from large-scale, empirical data to improve our computational design rules and create optimized sgRNA libraries that maximize on-target activity and minimize off-target effects to enable more effective and efficient genetic screens and genome engineering. Collapse Key Words Collapse MESH Headings Animals CRISPR-Cas Systems/genetics Cell Line, Tumor Drug Resistance/genetics Gene Library Genetic Engineering/methods Genome/genetics Genomics/methods Humans Mice RNA, Guide, CRISPR-Cas Systems/genetics Collapse Grants K12 CA087723 NCI NIH HHS T32 AI007163 NIAID NIH HHS U19 AI109725 NIAID NIH HHS 5K12CA087723-12 NCI NIH HHS Collapse	Research Support, N.I.H., Extramural	9	2868
14	Lawrence M, Huber W, Pagès H, Aboyoun P, Carlson M, Gentleman R, Morgan MT, Carey VJ. Software for computing and annotating genomic ranges. PLoS Comput Biol 2013;9:e1003118. [PMID: 23950696 PMCID: PMC3738458 DOI: 10.1371/journal.pcbi.1003118] [Citation(s) in RCA: 2733] [Impact Index Per Article: 227.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2013] [Accepted: 05/07/2013] [Indexed: 11/23/2022] Open Abstract We describe Bioconductor infrastructure for representing and computing on annotated genomic ranges and integrating genomic data with the statistical computing features of R and its extensions. At the core of the infrastructure are three packages: IRanges, GenomicRanges, and GenomicFeatures. These packages provide scalable data structures for representing annotated ranges on the genome, with special support for transcript structures, read alignments and coverage vectors. Computational facilities include efficient algorithms for overlap and nearest neighbor detection, coverage calculation and other range operations. This infrastructure directly supports more than 80 other Bioconductor packages, including those for sequence analysis, differential expression analysis and visualization. Collapse Key Words Collapse MESH Headings Algorithms Animals Databases, Genetic Genomics/methods Genomics/standards Humans Mice Sequence Alignment Sequence Analysis, DNA Software Collapse Grants R01 HL093076 NHLBI NIH HHS R01 HL094635 NHLBI NIH HHS P41 HG004059 NHGRI NIH HHS U41 HG004059 NHGRI NIH HHS R01 HL086601 NHLBI NIH HHS Collapse	Research Support, N.I.H., Extramural	12	2733
15	Durinck S, Spellman PT, Birney E, Huber W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat Protoc 2009;4:1184-91. [PMID: 19617889 PMCID: PMC3159387 DOI: 10.1038/nprot.2009.97] [Citation(s) in RCA: 2531] [Impact Index Per Article: 158.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Abstract Genomic experiments produce multiple views of biological systems, among them are DNA sequence and copy number variation, and mRNA and protein abundance. Understanding these systems needs integrated bioinformatic analysis. Public databases such as Ensembl provide relationships and mappings between the relevant sets of probe and target molecules. However, the relationships can be biologically complex and the content of the databases is dynamic. We demonstrate how to use the computational environment R to integrate and jointly analyze experimental datasets, employing BioMart web services to provide the molecule mappings. We also discuss typical problems that are encountered in making gene-to-transcript-to-protein mappings. The approach provides a flexible, programmable and reproducible basis for state-of-the-art bioinformatic data integration. Collapse Key Words data integration mapping identifiers ensembl biomart bioconductor Collapse MESH Headings Cell Line Chromosome Mapping Cluster Analysis Databases, Genetic Genomics/methods Humans RNA, Messenger/metabolism Software Collapse Grants U24 CA126551 NCI NIH HHS U24 CA126551-01 NCI NIH HHS Collapse	Research Support, N.I.H., Extramural	16	2531
16	Colaprico A, Silva TC, Olsen C, Garofano L, Cava C, Garolini D, Sabedot TS, Malta TM, Pagnotta SM, Castiglioni I, Ceccarelli M, Bontempi G, Noushmehr H. TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res 2016;44:e71. [PMID: 26704973 PMCID: PMC4856967 DOI: 10.1093/nar/gkv1507] [Citation(s) in RCA: 2445] [Impact Index Per Article: 271.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2015] [Revised: 12/06/2015] [Accepted: 12/10/2015] [Indexed: 12/18/2022] Open Abstract The Cancer Genome Atlas (TCGA) research network has made public a large collection of clinical and molecular phenotypes of more than 10 000 tumor patients across 33 different tumor types. Using this cohort, TCGA has published over 20 marker papers detailing the genomic and epigenomic alterations associated with these tumor types. Although many important discoveries have been made by TCGA's research network, opportunities still exist to implement novel methods, thereby elucidating new biological pathways and diagnostic markers. However, mining the TCGA data presents several bioinformatics challenges, such as data retrieval and integration with clinical data and other molecular data types (e.g. RNA and DNA methylation). We developed an R/Bioconductor package called TCGAbiolinks to address these challenges and offer bioinformatics solutions by using a guided workflow to allow users to query, download and perform integrative analyses of TCGA data. We combined methods from computer science and statistics into the pipeline and incorporated methodologies developed in previous TCGA marker studies and in our own group. Using four different TCGA tumor types (Kidney, Brain, Breast and Colon) as examples, we provide case studies to illustrate examples of reproducibility, integrative analysis and utilization of different Bioconductor packages to advance and accelerate novel discoveries. Collapse Key Words Collapse MESH Headings BRCA1 Protein/genetics BRCA2 Protein/genetics Biomarkers, Tumor/genetics Computational Biology/methods DNA Methylation/genetics Data Mining/methods Databases, Genetic Genome, Human/genetics Genomics/methods Humans Neoplasms/classification Neoplasms/genetics Statistics as Topic/methods Collapse Grants Collapse	research-article	9	2445
17	Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, Boutell JM, Bryant J, Carter RJ, Keira Cheetham R, Cox AJ, Ellis DJ, Flatbush MR, Gormley NA, Humphray SJ, Irving LJ, Karbelashvili MS, Kirk SM, Li H, Liu X, Maisinger KS, Murray LJ, Obradovic B, Ost T, Parkinson ML, Pratt MR, Rasolonjatovo IMJ, Reed MT, Rigatti R, Rodighiero C, Ross MT, Sabot A, Sankar SV, Scally A, Schroth GP, Smith ME, Smith VP, Spiridou A, Torrance PE, Tzonev SS, Vermaas EH, Walter K, Wu X, Zhang L, Alam MD, Anastasi C, Aniebo IC, Bailey DMD, Bancarz IR, Banerjee S, Barbour SG, Baybayan PA, Benoit VA, Benson KF, Bevis C, Black PJ, Boodhun A, Brennan JS, Bridgham JA, Brown RC, Brown AA, Buermann DH, Bundu AA, Burrows JC, Carter NP, Castillo N, Chiara E Catenazzi M, Chang S, Neil Cooley R, Crake NR, Dada OO, Diakoumakos KD, Dominguez-Fernandez B, Earnshaw DJ, Egbujor UC, Elmore DW, Etchin SS, Ewan MR, Fedurco M, Fraser LJ, Fuentes Fajardo KV, Scott Furey W, George D, Gietzen KJ, Goddard CP, Golda GS, Granieri PA, Green DE, Gustafson DL, Hansen NF, Harnish K, Haudenschild CD, Heyer NI, Hims MM, Ho JT, Horgan AM, et alBentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, Boutell JM, Bryant J, Carter RJ, Keira Cheetham R, Cox AJ, Ellis DJ, Flatbush MR, Gormley NA, Humphray SJ, Irving LJ, Karbelashvili MS, Kirk SM, Li H, Liu X, Maisinger KS, Murray LJ, Obradovic B, Ost T, Parkinson ML, Pratt MR, Rasolonjatovo IMJ, Reed MT, Rigatti R, Rodighiero C, Ross MT, Sabot A, Sankar SV, Scally A, Schroth GP, Smith ME, Smith VP, Spiridou A, Torrance PE, Tzonev SS, Vermaas EH, Walter K, Wu X, Zhang L, Alam MD, Anastasi C, Aniebo IC, Bailey DMD, Bancarz IR, Banerjee S, Barbour SG, Baybayan PA, Benoit VA, Benson KF, Bevis C, Black PJ, Boodhun A, Brennan JS, Bridgham JA, Brown RC, Brown AA, Buermann DH, Bundu AA, Burrows JC, Carter NP, Castillo N, Chiara E Catenazzi M, Chang S, Neil Cooley R, Crake NR, Dada OO, Diakoumakos KD, Dominguez-Fernandez B, Earnshaw DJ, Egbujor UC, Elmore DW, Etchin SS, Ewan MR, Fedurco M, Fraser LJ, Fuentes Fajardo KV, Scott Furey W, George D, Gietzen KJ, Goddard CP, Golda GS, Granieri PA, Green DE, Gustafson DL, Hansen NF, Harnish K, Haudenschild CD, Heyer NI, Hims MM, Ho JT, Horgan AM, Hoschler K, Hurwitz S, Ivanov DV, Johnson MQ, James T, Huw Jones TA, Kang GD, Kerelska TH, Kersey AD, Khrebtukova I, Kindwall AP, Kingsbury Z, Kokko-Gonzales PI, Kumar A, Laurent MA, Lawley CT, Lee SE, Lee X, Liao AK, Loch JA, Lok M, Luo S, Mammen RM, Martin JW, McCauley PG, McNitt P, Mehta P, Moon KW, Mullens JW, Newington T, Ning Z, Ling Ng B, Novo SM, O'Neill MJ, Osborne MA, Osnowski A, Ostadan O, Paraschos LL, Pickering L, Pike AC, Pike AC, Chris Pinkard D, Pliskin DP, Podhasky J, Quijano VJ, Raczy C, Rae VH, Rawlings SR, Chiva Rodriguez A, Roe PM, Rogers J, Rogert Bacigalupo MC, Romanov N, Romieu A, Roth RK, Rourke NJ, Ruediger ST, Rusman E, Sanches-Kuiper RM, Schenker MR, Seoane JM, Shaw RJ, Shiver MK, Short SW, Sizto NL, Sluis JP, Smith MA, Ernest Sohna Sohna J, Spence EJ, Stevens K, Sutton N, Szajkowski L, Tregidgo CL, Turcatti G, Vandevondele S, Verhovsky Y, Virk SM, Wakelin S, Walcott GC, Wang J, Worsley GJ, Yan J, Yau L, Zuerlein M, Rogers J, Mullikin JC, Hurles ME, McCooke NJ, West JS, Oaks FL, Lundberg PL, Klenerman D, Durbin R, Smith AJ. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 2008;456:53-9. [PMID: 18987734 PMCID: PMC2581791 DOI: 10.1038/nature07517] [Show More Authors] [Citation(s) in RCA: 2434] [Impact Index Per Article: 143.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2008] [Accepted: 10/02/2008] [Indexed: 11/15/2022] Abstract DNA sequence information underpins genetic research, enabling discoveries of important biological or medical benefit. Sequencing projects have traditionally used long (400-800 base pair) reads, but the existence of reference sequences for the human and many other genomes makes it possible to develop new, fast approaches to re-sequencing, whereby shorter reads are compared to a reference to identify intraspecies genetic variation. Here we report an approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost. Single molecules of DNA are attached to a flat surface, amplified in situ and used as templates for synthetic sequencing with fluorescent reversible terminator deoxyribonucleotides. Images of the surface are analysed to generate high-quality sequence. We demonstrate application of this approach to human genome sequencing on flow-sorted X chromosomes and then scale the approach to determine the genome sequence of a male Yoruba from Ibadan, Nigeria. We build an accurate consensus sequence from >30x average depth of paired 35-base reads. We characterize four million single-nucleotide polymorphisms and four hundred thousand structural variants, many of which were previously unknown. Our approach is effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications. Collapse Key Words Collapse MESH Headings Chromosomes, Human, X/genetics Consensus Sequence/genetics Genome, Human/genetics Genomics/economics Genomics/methods Genotype Humans Male Nigeria Polymorphism, Single Nucleotide/genetics Sensitivity and Specificity Sequence Analysis, DNA/economics Sequence Analysis, DNA/methods Collapse Grants Z01 HG200330-03 Intramural NIH HHS Wellcome Trust G0701805 Medical Research Council B05823 Biotechnology and Biological Sciences Research Council MOL04534 Biotechnology and Biological Sciences Research Council Collapse	Research Support, N.I.H., Intramural	17	2434
18	Goldman MJ, Craft B, Hastie M, Repečka K, McDade F, Kamath A, Banerjee A, Luo Y, Rogers D, Brooks AN, Zhu J, Haussler D. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol 2020;38:675-678. [PMID: 32444850 PMCID: PMC7386072 DOI: 10.1038/s41587-020-0546-8] [Citation(s) in RCA: 2409] [Impact Index Per Article: 481.8] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Abstract Collapse Key Words Collapse MESH Headings Databases, Genetic Genomics/methods Humans Neoplasms/genetics Software Collapse Grants U24 CA210974 NCI NIH HHS U24 CA180951 NCI NIH HHS Collapse	Letter	5	2409
19	Martí-Renom MA, Stuart AC, Fiser A, Sánchez R, Melo F, Sali A. Comparative protein structure modeling of genes and genomes. ANNUAL REVIEW OF BIOPHYSICS AND BIOMOLECULAR STRUCTURE 2001;29:291-325. [PMID: 10940251 DOI: 10.1146/annurev.biophys.29.1.291] [Citation(s) in RCA: 2376] [Impact Index Per Article: 99.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Abstract Comparative modeling predicts the three-dimensional structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. The number of protein sequences that can be modeled and the accuracy of the predictions are increasing steadily because of the growth in the number of known protein structures and because of the improvements in the modeling software. Further advances are necessary in recognizing weak sequence-structure similarities, aligning sequences with structures, modeling of rigid body shifts, distortions, loops and side chains, as well as detecting errors in a model. Despite these problems, it is currently possible to model with useful accuracy significant parts of approximately one third of all known protein sequences. The use of individual comparative models in biology is already rewarding and increasingly widespread. A major new challenge for comparative modeling is the integration of it with the torrents of data from genome sequencing projects as well as from functional and structural genomics. In particular, there is a need to develop an automated, rapid, robust, sensitive, and accurate comparative modeling pipeline applicable to whole genomes. Such large-scale modeling is likely to encourage new kinds of applications for the many resulting models, based on their large number and completeness at the level of the family, organism, or functional network. Collapse Key Words Collapse MESH Headings Animals Computer Simulation Databases, Factual Genome Genomics/methods Humans Models, Biological Models, Genetic Models, Molecular Proteins/chemistry Collapse Grants GM 54762 NIGMS NIH HHS Collapse	Comparative Study	24	2376
20	Cho I, Blaser MJ. The human microbiome: at the interface of health and disease. Nat Rev Genet 2012;13:260-70. [PMID: 22411464 PMCID: PMC3418802 DOI: 10.1038/nrg3182] [Citation(s) in RCA: 2221] [Impact Index Per Article: 170.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Abstract Interest in the role of the microbiome in human health has burgeoned over the past decade with the advent of new technologies for interrogating complex microbial communities. The large-scale dynamics of the microbiome can be described by many of the tools and observations used in the study of population ecology. Deciphering the metagenome and its aggregate genetic information can also be used to understand the functional properties of the microbial community. Both the microbiome and metagenome probably have important functions in health and disease; their exploration is a frontier in human genetics. Collapse Key Words Collapse MESH Headings Arthritis, Rheumatoid/etiology Bacteria/classification Bacteria/genetics Colon/microbiology Gastrointestinal Tract/microbiology Genomics/methods Humans Inflammatory Bowel Diseases/microbiology Liver Diseases/etiology Metagenome Obesity/etiology Collapse Grants 5 P30 CA016087 NCI NIH HHS P30 CA016087 NCI NIH HHS 1UL1RR029893 NCRR NIH HHS R01 GM063270 NIGMS NIH HHS R01DK090989 NIDDK NIH HHS R01GM63270 NIGMS NIH HHS R01 DK090989 NIDDK NIH HHS UL1 RR029893 NCRR NIH HHS UH2 AR057506 NIAMS NIH HHS Collapse	Research Support, N.I.H., Extramural	13	2221
21	Huson DH, Auch AF, Qi J, Schuster SC. MEGAN analysis of metagenomic data. Genome Res 2007;17:377-86. [PMID: 17255551 PMCID: PMC1800929 DOI: 10.1101/gr.5969107] [Citation(s) in RCA: 2062] [Impact Index Per Article: 114.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Abstract Metagenomics is the study of the genomic content of a sample of organisms obtained from a common habitat using targeted or random sequencing. Goals include understanding the extent and role of microbial diversity. The taxonomical content of such a sample is usually estimated by comparison against sequence databases of known sequences. Most published studies use the analysis of paired-end reads, complete sequences of environmental fosmid and BAC clones, or environmental assemblies. Emerging sequencing-by-synthesis technologies with very high throughput are paving the way to low-cost random "shotgun" approaches. This paper introduces MEGAN, a new computer program that allows laptop analysis of large metagenomic data sets. In a preprocessing step, the set of DNA sequences is compared against databases of known sequences using BLAST or another comparison tool. MEGAN is then used to compute and explore the taxonomical content of the data set, employing the NCBI taxonomy to summarize and order the results. A simple lowest common ancestor algorithm assigns reads to taxa such that the taxonomical level of the assigned taxon reflects the level of conservation of the sequence. The software allows large data sets to be dissected without the need for assembly or the targeting of specific phylogenetic markers. It provides graphical and statistical output for comparing different data sets. The approach is applied to several data sets, including the Sargasso Sea data set, a recently published metagenomic data set sampled from a mammoth bone, and several complete microbial genomes. Also, simulations that evaluate the performance of the approach for different read lengths are presented. Collapse Key Words Collapse MESH Headings Atlantic Ocean Biodiversity Computational Biology/methods Ecosystem Genetic Variation Genome/genetics Genomics/methods Phylogeny Software Species Specificity Collapse Grants Collapse	Research Support, Non-U.S. Gov't	18	2062
22	Laslett D, Canback B. ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res 2004;32:11-6. [PMID: 14704338 PMCID: PMC373265 DOI: 10.1093/nar/gkh152] [Citation(s) in RCA: 1985] [Impact Index Per Article: 94.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open Abstract A computer program, ARAGORN, identifies tRNA and tmRNA genes. The program employs heuristic algorithms to predict tRNA secondary structure, based on homology with recognized tRNA consensus sequences and ability to form a base-paired cloverleaf. tmRNA genes are identified using a modified version of the BRUCE program. ARAGORN achieves a detection sensitivity of 99% from a set of 1290 eubacterial, eukaryotic and archaeal tRNA genes and detects all complete tmRNA sequences in the tmRNA database, improving on the performance of the BRUCE program. Recently discovered tmRNA genes in the chloroplasts of two species from the 'green' algae lineage are detected. The output of the program reports the proposed tRNA secondary structure and, for tmRNA genes, the secondary structure of the tRNA domain, the tmRNA gene sequence, the tag peptide and a list of organisms with matching tmRNA peptide tags. Collapse Key Words Collapse MESH Headings Algorithms Base Sequence Computational Biology/methods Genomics/methods Internet Molecular Sequence Data Nucleic Acid Conformation RNA, Bacterial/chemistry RNA, Bacterial/genetics RNA, Transfer/chemistry RNA, Transfer/genetics Sensitivity and Specificity Software Collapse Grants Collapse	Journal Article	21	1985
23	Meier-Kolthoff JP, Göker M. TYGS is an automated high-throughput platform for state-of-the-art genome-based taxonomy. Nat Commun 2019;10:2182. [PMID: 31097708 PMCID: PMC6522516 DOI: 10.1038/s41467-019-10210-3] [Citation(s) in RCA: 1889] [Impact Index Per Article: 314.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Accepted: 04/29/2019] [Indexed: 02/07/2023] Open Abstract Microbial taxonomy is increasingly influenced by genome-based computational methods. Yet such analyses can be complex and require expert knowledge. Here we introduce TYGS, the Type (Strain) Genome Server, a user-friendly high-throughput web server for genome-based prokaryote taxonomy, connected to a large, continuously growing database of genomic, taxonomic and nomenclatural information. It infers genome-scale phylogenies and state-of-the-art estimates for species and subspecies boundaries from user-defined and automatically determined closest type genome sequences. TYGS also provides comprehensive access to nomenclature, synonymy and associated taxonomic literature. Clinically important examples demonstrate how TYGS can yield new insights into microbial classification, such as evidence for a species-level separation of previously proposed subspecies of Salmonella enterica. TYGS is an integrated approach for the classification of microbes that unlocks novel scientific approaches to microbiologists worldwide and is particularly helpful for the rapidly expanding field of genome-based taxonomic descriptions of new genera, species or subspecies. Collapse Key Words phylogeny taxonomy microbiology Collapse MESH Headings Archaea/classification Archaea/genetics Bacteria/classification Bacteria/genetics Databases, Genetic Genome, Archaeal/genetics Genome, Bacterial/genetics Genomics/methods Phylogeny Collapse Grants TRR 51 Deutsche Forschungsgemeinschaft (German Research Foundation) Collapse	research-article	6	1889
24	Schattner P, Brooks AN, Lowe TM. The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res 2005;33:W686-9. [PMID: 15980563 PMCID: PMC1160127 DOI: 10.1093/nar/gki366] [Citation(s) in RCA: 1849] [Impact Index Per Article: 92.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open Abstract Transfer RNAs (tRNAs) and small nucleolar RNAs (snoRNAs) are two of the largest classes of non-protein-coding RNAs. Conventional gene finders that detect protein-coding genes do not find tRNA and snoRNA genes because they lack the codon structure and statistical signatures of protein-coding genes. Previously, we developed tRNAscan-SE, snoscan and snoGPS for the detection of tRNAs, methylation-guide snoRNAs and pseudouridylation-guide snoRNAs, respectively. tRNAscan-SE is routinely applied to completed genomes, resulting in the identification of thousands of tRNA genes. Snoscan has successfully detected methylation-guide snoRNAs in a variety of eukaryotes and archaea, and snoGPS has identified novel pseudouridylation-guide snoRNAs in yeast and mammals. Although these programs have been quite successful at RNA gene detection, their use has been limited by the need to install and configure the software packages on UNIX workstations. Here, we describe online implementations of these RNA detection tools that make these programs accessible to a wider range of research biologists. The tRNAscan-SE, snoscan and snoGPS servers are available at http://lowelab.ucsc.edu/tRNAscan-SE/, http://lowelab.ucsc.edu/snoscan/ and http://lowelab.ucsc.edu/snoGPS/, respectively. Collapse Key Words Collapse MESH Headings Genomics/methods Internet RNA, Small Nucleolar/genetics RNA, Transfer/genetics Software User-Computer Interface Collapse Grants Collapse	research-article	20	1849
25	Jin JJ, Yu WB, Yang JB, Song Y, dePamphilis CW, Yi TS, Li DZ. GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biol 2020;21:241. [PMID: 32912315 PMCID: PMC7488116 DOI: 10.1186/s13059-020-02154-5] [Citation(s) in RCA: 1756] [Impact Index Per Article: 351.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Accepted: 08/24/2020] [Indexed: 12/13/2022] Open Abstract GetOrganelle is a state-of-the-art toolkit to accurately assemble organelle genomes from whole genome sequencing data. It recruits organelle-associated reads using a modified "baiting and iterative mapping" approach, conducts de novo assembly, filters and disentangles the assembly graph, and produces all possible configurations of circular organelle genomes. For 50 published plant datasets, we are able to reassemble the circular plastomes from 47 datasets using GetOrganelle. GetOrganelle assemblies are more accurate than published and/or NOVOPlasty-reassembled plastomes as assessed by mapping. We also assemble complete mitochondrial genomes using GetOrganelle. GetOrganelle is freely released under a GPL-3 license ( https://github.com/Kinggerm/GetOrganelle ). Collapse Key Words Assembler Assembly graph Mitogenome Organelle genome Plastome Collapse MESH Headings Genome, Mitochondrial Genome, Plant Genome, Plastid Genomics/methods Software Collapse Grants Strategic Priority Research Program of the Chinese Academy of Sciences National Natural Science Foundation of China Chinese Academy of Sciences Large-scale Scientific Facilities the open research project of “Cross-Cooperative Team” of the Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences CAS 135 Program Collapse	Evaluation Study	5	1756

Please SIGN IN to browse more articles.